Search results for "louhinta"
showing 10 items of 93 documents
Intrusion detection applications using knowledge discovery and data mining
2014
Improving Scalable K-Means++
2021
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation …
Improvements and applications of the elements of prototype-based clustering
2018
Clustering or cluster analysis is an essential part of data mining, machine learning, and pattern recognition. The most popularly applied clustering methods are partitioning-based or prototype-based methods. Prototype-based clustering methods usually have easy implementability and good scalability. These methods, such as K-means clustering, have been used for different applications in various fields. On the other hand, prototype-based clustering methods are typically sensitive to initialization, and the selection of the number of clusters for knowledge discovery purposes is not straightforward. In the era of big data, in high-velocity, ever-growing datasets, which can also be erroneous, outl…
Automatic Taxonomy Induction based on Word-embedding of Neural Nets
2018
Taxonomy is a knowledge management tool that presents useful information in a well-ordered structure prevents overloading of information on its access and making the information access qualitative. This article is concerned with automatically extracting asymmetrical hierarchical relations from a large corpus and subsequent taxonomy construction by domain independent and semi-supervised system. The methodology relies on the term’s distributional semantics. The algorithm utilizes the word-embedding generated from the vector space model. The model is trained over a large corpus to generate word-embedding of each word in a corpus. Then, the system finds and extracts the hypernyms by using the g…
Detecting cellular network anomalies using the knowledge discovery process
2015
Analytical companies unanimously forecast the exponential growth of mobile traffic consumption over the next five years. The densification of a network structure with small cells is regarded as a key solution to meet growing capacity demands. The manual management of a multi-layer network is a very expensive, error prone, and sluggish process. Hence, the automation of the whole life cycle of network operation is highly anticipated. To this aim 3GPP introduces a self-management concept referred to as SON. It is envisioned that SON updates information concerning the latest network conditions through the MDT mecha- nism. MDT enables a network operator to collect radio and service quality measurem…
Advanced performance monitoring for self-healing cellular mobile networks
2015
This dissertation is devoted to development and validation of advanced per- formance monitoring system for existing and future cellular mobile networks. Knowledge mining techniques are employed for analysis of user specific logs, collected with Minimization of Drive Tests (MDT) functionality. Ever increas- ing quality requirements, expansion of the mobile networks and their extend- ing heterogeneity, call for effective automatic means of performance monitoring. Nowadays, network operation is mostly controlled manually through aggregated key performance indicators and statistical profiles. These methods are are not able to fully address the dynamism and complexity of modern mobile networks. Se…
Intelligent solutions for real-life data-driven applications
2017
The subject of this thesis belongs to the topic of machine learning or, specifically, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking improved production practices and minimized production time and costs. In connection to this, several industrial case studies are presented in which mathematical models for predicting paper quality were proposed. The most important variables for the prediction models are selected based on information-theoretic measures and regression trees approach. The rest of the original papers are devoted to unsupervised machine learning. The main focus is developing advanced spectral cluster…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …
Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks
2015
The Sleeping Cell problem is a particular type of cell degradation in Long-Term Evolution (LTE) networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. In this study a cell becomes sleeping because of a Random Access Channel (RACH) failure, which may happen due to software or hardware problems. For the detection of malfunctioning cells, we introduce a data mining based framework. In its core is the analysis of event sequences reported by a User Equipment (UE) to a serving Base Station (BS). The crucial element of the developed framework is an anomaly detection algorithm. We compare perfor…
Semi-automatic literature mapping of participatory design studies 2006--2016
2018
The paper presents a process of semi-automatic literature mapping of a comprehensive set of participatory design studies between 2006--2016. The data of 2939 abstracts were collected from 14 academic search engines and databases. With the presented method, we were able to identify six education-related clusters of PD articles. Furthermore, we point out that the identified clusters cover the majority of education-related words in the whole data. This is the first attempt to systematically map the participatory design literature. We argue that by continuing our work, we can help to perceive a coherent structure in the body of PD research.