Search results for "Data mining"
showing 10 items of 907 documents
Multispectral filter arrays: Recent advances and practical implementation
2014
Thanks to some technical progress in interferencefilter design based on different technologies, we can finally successfully implement the concept of multispectral filter array-based sensors. This article provides the relevant state-of-the-art for multispectral imaging systems and presents the characteristics of the elements of our multispectral sensor as a case study. The spectral characteristics are based on two different spatial arrangements that distribute eight different bandpass filters in the visible and near-infrared area of the spectrum. We demonstrate that the system is viable and evaluate its performance through sensor spectral simulation. Multispectral filter arrays: Recent advan…
Continuum: A spatiotemporal data model to represent and qualify filiation relationships
2013
International audience; This work introduces an ontology-based spatio-temporal data model to represent entities evolving in space and time. A dynamic phenomenon generates a complex relationship network between the entities involved in the process. At the abstract level, the relationships can be identity or topological filiations. The existence of an identity filiation depends on whether the object changes its identity or not. On the other hand, topological filiations are based exclusively on the spatial component, like in the case of growth, reduction, merging or splitting. When combining identity and topological filiations, six filiation relationships are obtained, forming a second abstrac…
ThemeMountain: a SVG-based Visual Data Mining Tool
2005
Anomaly detection approach to keystroke dynamics based user authentication
2017
Keystroke dynamics is one of the authentication mechanisms which uses natural typing pattern of a user for identification. In this work, we introduced Dependence Clustering based approach to user authentication using keystroke dynamics. In addition, we applied a k-NN-based approach that demonstrated strong results. Most of the existing approaches use only genuine users data for training and validation. We designed a cross validation procedure with artificially generated impostor samples that improves the learning process yet allows fair comparison to previous works. We evaluated the methods using the CMU keystroke dynamics benchmark dataset. Both proposed approaches outperformed the previou…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …
Gear classification and fault detection using a diffusion map framework
2015
This article proposes a system health monitoring approach that detects abnormal behavior of machines. Diffusion map is used to reduce the dimensionality of training data, which facilitates the classification of newly arriving measurements. The new measurements are handled with Nyström extension. The method is trained and tested with real gear monitoring data from several windmill parks. A machine health index is proposed, showing that data recordings can be classified as working or failing using dimensionality reduction and warning levels in the low dimensional space. The proposed approach can be used with any system that produces high-dimensional measurement data. peerReviewed
Quantile index for gradual and abrupt change detection from CFB boiler sensor data in online settings
2012
In this paper we consider the problem of online detection of gradual and abrupt changes in sensor data having high levels of noise and outliers. We propose a simple heuristic method based on the Quantile Index (QI) and study how robust this method is for detecting both gradual and abrupt changes with such data. We evaluate the performance of our method on the artificially generated and real datasets that represent different operational settings of a pilot circulating fluidized bed (CFB) reactor and CFB cold model. Our experiments suggest that QI can be used for designing very simple yet effective methods for gradual change detection in the noisy sensor data. It can be also used for detectin…
Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks
2015
The Sleeping Cell problem is a particular type of cell degradation in Long-Term Evolution (LTE) networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. In this study a cell becomes sleeping because of a Random Access Channel (RACH) failure, which may happen due to software or hardware problems. For the detection of malfunctioning cells, we introduce a data mining based framework. In its core is the analysis of event sequences reported by a User Equipment (UE) to a serving Base Station (BS). The crucial element of the developed framework is an anomaly detection algorithm. We compare perfor…
A modelling framework for social media monitoring
2013
This paper describes a hierarchical, three-level modelling framework for monitoring social media. Immediate social reality is modelled through the first level of the models. They represent various virtual communities at social media sites and adhere to the social world models of the sites, i.e., the "site ontologies". The second-level model is a temporal multirelational graph that captures the static and dynamic properties of the first-level models from the perspective of the monitoring site. The third-level model consists of a temporal relational database scheme that models the temporal multirelational graph within the database. The models are specified and instantiated at the monitoring s…
Twister Tries
2015
Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…