Search results for "clustering"

showing 6 items of 446 documents

Scalable implementation of dependence clustering in Apache Spark

2017

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed

ta113ta213Apache SparkComputer sciencedatasetsCorrelation clusteringdata miningcomputer.software_genrealgorithmsSpectral clusteringComputational sciencedependence clusteringData stream clusteringCURE data clustering algorithmScalabilitySpark (mathematics)algoritmitCanopy clustering algorithmData miningtiedonlouhintaCluster analysisclustering algorithmscomputerdata processingtietojenkäsittely
researchProduct

Unsupervised network intrusion detection systems for zero-day fast-spreading network attacks and botnets

2015

Today, the occurrence of zero-day and complex attacks in high-speed networks is increasingly common due to the high number vulnerabilities in the cyber world. As a result, intrusions become more sophisticated and fast to detrimental the networks and hosts. Due to these reasons real-time monitoring, processing and intrusion detection are now among the key features of NIDS. Traditional types of intrusion detection systems such as signature base IDS are not able detect intrusions with new and complex strategies. Now days, automatic traffic analysis and anomaly intrusion detection became more efficient in field of network security however they suffer from high number of false alarms. Among all …

tunkeilijan havaitsemisjärjestelmätintrusion detectionmonitorointitietoliikenneverkottiedonsiirtoanomaly detectionreaaliaikaisuusmachine learningclustering (unsupervised)koneoppiminenalgoritmitnetwork securityklusterianalyysitietoturvaverkkohyökkäykset
researchProduct

Application of the Information Bottleneck method to discover user profiles in a Web store

2018

The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…

unsupervised classificationComputer science02 engineering and technologyE-commerceCustomer profile020204 information systems0202 electrical engineering electronic engineering information engineeringe-commerceWeb storeCluster analysisUser profileInformation retrievalbusiness.industrycustomer profileBehavioral patternInformation bottleneck methoddata miningComputer Science Applicationsmachine learningComputational Theory and MathematicsAgglomerative Information Bottleneck020201 artificial intelligence & image processinguser profilebusinessclusteringInformation SystemsJournal of Organizational Computing and Electronic Commerce
researchProduct

Pricavy-Preserving Aspects for Data Mining in ICT Services

The steady adoption of systems for profiling users behavior, collecting and critically interpreting as much information as possible about likes and dislikes, interests and habits of Internet residents and generic services consumers have rapidly become some of the hottest keywords within networking research community. Indeed, mining information about users behavior is an advantage for both service providers and service customers: on one side, providers can improve their revenues by focusing on the most successful features of their services, while on the other side, users can enjoy services which reflect closer their specific needs. There are many examples of user profiling applications. Inte…

user profilingsecure multi-party computationSettore ING-INF/03 - Telecomunicazionisecret sharingdata miningprivacyclustering
researchProduct

Identifying the Sales Patterns of Online Stores with Time Series Clustering

2018

Electronic commerce, especially in the business-to-consumer (B2C) context, has for years been a popular research topic in information systems (IS). However, the prior research on the topic has traditionally been dominated by the consumer focus instead of the business focus of online stores. For example, whereas various segmentations exist for online consumers based on their purchase behaviour, no such segmentations have been developed for online stores based on their sales patterns. In this study, our objective is to address this gap in prior research by identifying the most typical sales patterns of online stores operating in the B2C context. By using self-organising maps (SOM) to analyse …

verkkokauppa (verkkoliiketoiminta)Series (mathematics)Computer scienceverkkokauppabusiness-to-consumercomputer.software_genreB2Conline storesklusteritsegmentointisales patternsSegmentationData miningCluster analysiscomputertime series clustering
researchProduct

H&E Multi-Laboratory Staining Variance Exploration with Machine Learning

2022

In diagnostic histopathology, hematoxylin and eosin (H&E) staining is a critical process that highlights salient histological features. Staining results vary between laboratories regardless of the histopathological task, although the method does not change. This variance can impair the accuracy of algorithms and histopathologists’ time-to-insight. Investigating this variance can help calibrate stain normalization tasks to reverse this negative potential. With machine learning, this study evaluated the staining variance between different laboratories on three tissue types. We received H&E-stained slides from 66 different laboratories. Each slide contained kidney, skin, and colon tissue sampl…

väriaineet318 Medical biotechnologyrand indexHE-värjäysk-meansstain normalizationnäytteetdiagnostiikkatekoälykudoksetlaboratoriotekniikkamachine learningkoneoppiminenkuvantaminenhematoksyliini-eosiini-värjäyshistologiahistopathologyhistopatologiaH&Eclusteringpatologia
researchProduct