Search results for "data"
showing 10 items of 12992 documents
Online anomaly detection using dimensionality reduction techniques for HTTP log analysis
2015
Modern web services face an increasing number of new threats. Logs are collected from almost all web servers, and for this reason analyzing them is beneficial when trying to prevent intrusions. Intrusive behavior often differs from the normal web traffic. This paper proposes a framework to find abnormal behavior from these logs. We compare random projection, principal component analysis and diffusion map for anomaly detection. In addition, the framework has online capabilities. The first two methods have intuitive extensions while diffusion map uses the Nyström extension. This fast out-of-sample extension enables real-time analysis of web server traffic. The framework is demonstrated using …
Listwise Collaborative Filtering
2015
Recently, ranking-oriented collaborative filtering (CF) algorithms have achieved great success in recommender systems. They obtained state-of-the-art performances by estimating a preference ranking of items for each user rather than estimating the absolute ratings on unrated items (as conventional rating-oriented CF algorithms do). In this paper, we propose a new ranking-oriented CF algorithm, called ListCF. Following the memory-based CF framework, ListCF directly predicts a total order of items for each user based on similar users' probability distributions over permutations of the items, and thus differs from previous ranking-oriented memory-based CF algorithms that focus on predicting th…
An Approach for Network Outage Detection from Drive-Testing Databases
2012
A data-mining framework for analyzing a cellular network drive testing database is described in this paper. The presented method is designed to detect sleeping base stations, network outage, and change of the dominance areas in a cognitive and self-organizing manner. The essence of the method is to find similarities between periodical network measurements and previously known outage data. For this purpose, diffusion maps dimensionality reduction and nearest neighbor data classification methods are utilized. The method is cognitive because it requires training data for the outage detection. In addition, the method is autonomous because it uses minimization of drive testing (MDT) functionalit…
Cognitive self-healing system for future mobile networks
2015
This paper introduces a framework and implementation of a cognitive self-healing system for fault detection and compensation in future mobile networks. Performance monitoring for failure identification is based on anomaly analysis, which is a combination of the nearest neighbor anomaly scoring and statistical profiling. Case-based reasoning algorithm is used for cognitive self-healing of the detected faulty cells. Validation environment is Long Term Evolution (LTE) mobile system simulated with Network Simulator 3 (ns-3) [1, 2]. Results demonstrate that cognitive approach is efficient for compensation of cell outages and is capable to improve network coverage. Anomaly analysis can be used fo…
Biased graph walks for RDF graph embeddings
2017
Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…
UAV-based hyperspectral monitoring of small freshwater area
2014
Recent development in compact, lightweight hyperspectral imagers have enabled UAV-based remote sensing with reasonable costs. We used small hyperspectral imager based on Fabry-Perot interferometer for monitoring small freshwater area in southern Finland. In this study we shortly describe the utilized technology and the field studies performed. We explain processing pipeline for gathered spectral data and introduce target detection-based algorithm for estimating levels of algae, aquatic chlorophyll and turbidity in freshwater. Certain challenges we faced are pointed out.
Research literature clustering using diffusion maps
2013
We apply the knowledge discovery process to the mapping of current topics in a particular field of science. We are interested in how articles form clusters and what are the contents of the found clusters. A framework involving web scraping, keyword extraction, dimensionality reduction and clustering using the diffusion map algorithm is presented. We use publicly available information about articles in high-impact journals. The method should be of use to practitioners or scientists who want to overview recent research in a field of science. As a case study, we map the topics in data mining literature in the year 2011. peerReviewed
Revealing Fake Profiles in Social Networks by Longitudinal Data Analysis
2017
100+ Metrics for Software Startups : A Multi-Vocal Literature Review
2018
Modelling Recurrent Events for Improving Online Change Detection
2016
The task of online change point detection in sensor data streams is often complicated due to presence of noise that can be mistaken for real changes and therefore affecting performance of change detectors. Most of the existing change detection methods assume that changes are independent from each other and occur at random in time. In this paper we study how performance of detectors can be improved in case of recurrent changes. We analytically demonstrate under which conditions and for how long recurrence information is useful for improving the detection accuracy. We propose a simple computationally efficient message passing procedure for calculating a predictive probability distribution of …