Search results for "clustering"
showing 6 items of 446 documents
Scalable implementation of dependence clustering in Apache Spark
2017
This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed
Unsupervised network intrusion detection systems for zero-day fast-spreading network attacks and botnets
2015
Today, the occurrence of zero-day and complex attacks in high-speed networks is increasingly common due to the high number vulnerabilities in the cyber world. As a result, intrusions become more sophisticated and fast to detrimental the networks and hosts. Due to these reasons real-time monitoring, processing and intrusion detection are now among the key features of NIDS. Traditional types of intrusion detection systems such as signature base IDS are not able detect intrusions with new and complex strategies. Now days, automatic traffic analysis and anomaly intrusion detection became more efficient in field of network security however they suffer from high number of false alarms. Among all …
Application of the Information Bottleneck method to discover user profiles in a Web store
2018
The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…
Pricavy-Preserving Aspects for Data Mining in ICT Services
The steady adoption of systems for profiling users behavior, collecting and critically interpreting as much information as possible about likes and dislikes, interests and habits of Internet residents and generic services consumers have rapidly become some of the hottest keywords within networking research community. Indeed, mining information about users behavior is an advantage for both service providers and service customers: on one side, providers can improve their revenues by focusing on the most successful features of their services, while on the other side, users can enjoy services which reflect closer their specific needs. There are many examples of user profiling applications. Inte…
Identifying the Sales Patterns of Online Stores with Time Series Clustering
2018
Electronic commerce, especially in the business-to-consumer (B2C) context, has for years been a popular research topic in information systems (IS). However, the prior research on the topic has traditionally been dominated by the consumer focus instead of the business focus of online stores. For example, whereas various segmentations exist for online consumers based on their purchase behaviour, no such segmentations have been developed for online stores based on their sales patterns. In this study, our objective is to address this gap in prior research by identifying the most typical sales patterns of online stores operating in the B2C context. By using self-organising maps (SOM) to analyse …
H&E Multi-Laboratory Staining Variance Exploration with Machine Learning
2022
In diagnostic histopathology, hematoxylin and eosin (H&E) staining is a critical process that highlights salient histological features. Staining results vary between laboratories regardless of the histopathological task, although the method does not change. This variance can impair the accuracy of algorithms and histopathologists’ time-to-insight. Investigating this variance can help calibrate stain normalization tasks to reverse this negative potential. With machine learning, this study evaluated the staining variance between different laboratories on three tissue types. We received H&E-stained slides from 66 different laboratories. Each slide contained kidney, skin, and colon tissue sampl…