Search results for "Mining"
showing 10 items of 1730 documents
Anomaly detection approach to keystroke dynamics based user authentication
2017
Keystroke dynamics is one of the authentication mechanisms which uses natural typing pattern of a user for identification. In this work, we introduced Dependence Clustering based approach to user authentication using keystroke dynamics. In addition, we applied a k-NN-based approach that demonstrated strong results. Most of the existing approaches use only genuine users data for training and validation. We designed a cross validation procedure with artificially generated impostor samples that improves the learning process yet allows fair comparison to previous works. We evaluated the methods using the CMU keystroke dynamics benchmark dataset. Both proposed approaches outperformed the previou…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …
A novel heuristic memetic clustering algorithm
2013
In this paper we introduce a novel clustering algorithm based on the Memetic Algorithm meta-heuristic wherein clusters are iteratively evolved using a novel single operator employing a combination of heuristics. Several heuristics are described and employed for the three types of selections used in the operator. The algorithm was exhaustively tested on three benchmark problems and compared to a classical clustering algorithm (k-Medoids) using the same performance metrics. The results show that our clustering algorithm consistently provides better clustering solutions with less computational effort.
Gear classification and fault detection using a diffusion map framework
2015
This article proposes a system health monitoring approach that detects abnormal behavior of machines. Diffusion map is used to reduce the dimensionality of training data, which facilitates the classification of newly arriving measurements. The new measurements are handled with Nyström extension. The method is trained and tested with real gear monitoring data from several windmill parks. A machine health index is proposed, showing that data recordings can be classified as working or failing using dimensionality reduction and warning levels in the low dimensional space. The proposed approach can be used with any system that produces high-dimensional measurement data. peerReviewed
Quantile index for gradual and abrupt change detection from CFB boiler sensor data in online settings
2012
In this paper we consider the problem of online detection of gradual and abrupt changes in sensor data having high levels of noise and outliers. We propose a simple heuristic method based on the Quantile Index (QI) and study how robust this method is for detecting both gradual and abrupt changes with such data. We evaluate the performance of our method on the artificially generated and real datasets that represent different operational settings of a pilot circulating fluidized bed (CFB) reactor and CFB cold model. Our experiments suggest that QI can be used for designing very simple yet effective methods for gradual change detection in the noisy sensor data. It can be also used for detectin…
Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks
2015
The Sleeping Cell problem is a particular type of cell degradation in Long-Term Evolution (LTE) networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. In this study a cell becomes sleeping because of a Random Access Channel (RACH) failure, which may happen due to software or hardware problems. For the detection of malfunctioning cells, we introduce a data mining based framework. In its core is the analysis of event sequences reported by a User Equipment (UE) to a serving Base Station (BS). The crucial element of the developed framework is an anomaly detection algorithm. We compare perfor…
A modelling framework for social media monitoring
2013
This paper describes a hierarchical, three-level modelling framework for monitoring social media. Immediate social reality is modelled through the first level of the models. They represent various virtual communities at social media sites and adhere to the social world models of the sites, i.e., the "site ontologies". The second-level model is a temporal multirelational graph that captures the static and dynamic properties of the first-level models from the perspective of the monitoring site. The third-level model consists of a temporal relational database scheme that models the temporal multirelational graph within the database. The models are specified and instantiated at the monitoring s…
Twister Tries
2015
Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…
A Hybrid Multigroup Coclustering Recommendation Framework Based on Information Fusion
2015
Collaborative Filtering (CF) is one of the most successful algorithms in recommender systems. However, it suffers from data sparsity and scalability problems. Although many clustering techniques have been incorporated to alleviate these two problems, most of them fail to achieve further significant improvement in recommendation accuracy. First of all, most of them assume each user or item belongs to a single cluster. Since usually users can hold multiple interests and items may belong to multiple categories, it is more reasonable to assume that users and items can join multiple clusters (groups), where each cluster is a subset of like-minded users and items they prefer. Furthermore, most of…
Support vector machine integrated with game-theoretic approach and genetic algorithm for the detection and classification of malware
2013
Abstract. —In the modern world, a rapid growth of mali- cious software production has become one of the most signifi- cant threats to the network security. Unfortunately, wides pread signature-based anti-malware strategies can not help to de tect malware unseen previously nor deal with code obfuscation te ch- niques employed by malware designers. In our study, the prob lem of malware detection and classification is solved by applyin g a data-mining-based approach that relies on supervised mach ine- learning. Executable files are presented in the form of byte a nd opcode sequences and n-gram models are employed to extract essential features from these sequences. Feature vectors o btained are…