Search results for "louhinta"
showing 10 items of 93 documents
An Approach for Network Outage Detection from Drive-Testing Databases
2012
A data-mining framework for analyzing a cellular network drive testing database is described in this paper. The presented method is designed to detect sleeping base stations, network outage, and change of the dominance areas in a cognitive and self-organizing manner. The essence of the method is to find similarities between periodical network measurements and previously known outage data. For this purpose, diffusion maps dimensionality reduction and nearest neighbor data classification methods are utilized. The method is cognitive because it requires training data for the outage detection. In addition, the method is autonomous because it uses minimization of drive testing (MDT) functionalit…
Biased graph walks for RDF graph embeddings
2017
Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…
Research literature clustering using diffusion maps
2013
We apply the knowledge discovery process to the mapping of current topics in a particular field of science. We are interested in how articles form clusters and what are the contents of the found clusters. A framework involving web scraping, keyword extraction, dimensionality reduction and clustering using the diffusion map algorithm is presented. We use publicly available information about articles in high-impact journals. The method should be of use to practitioners or scientists who want to overview recent research in a field of science. As a case study, we map the topics in data mining literature in the year 2011. peerReviewed
An Efficient Network Log Anomaly Detection System Using Random Projection Dimensionality Reduction
2014
Network traffic is increasing all the time and network services are becoming more complex and vulnerable. To protect these networks, intrusion detection systems are used. Signature-based intrusion detection cannot find previously unknown attacks, which is why anomaly detection is needed. However, many new systems are slow and complicated. We propose a log anomaly detection framework which aims to facilitate quick anomaly detection and also provide visualizations of the network traffic structure. The system preprocesses network logs into a numerical data matrix, reduces the dimensionality of this matrix using random projection and uses Mahalanobis distance to find outliers and calculate an a…
Scalable implementation of dependence clustering in Apache Spark
2017
This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed
Yhteisöpalvelujen tägien hyödyntäminen ICT-alan muutosten ennakoinnissa ja havainnoinnissa
2012
Tägejä ja näiden emergenttiä ontologiaa, eli folksonomiaa on tutkittu jo vuodesta 2004 lähtien. Tänä aikana tägien käyttö on yleistynyt Internetin kautta tarjottavissa palveluissa ja kyseisten palveluiden suosio on kavanut, luoden suuren määrän tägidataa. Tätä dataa ollaan kuitenkin harvoin käytetty muuhun, kuin resurssien löytämiseen ja uudelleenlöytämiseen (jota varten data tietysti on luotukin). Tässä tutkielmassa pyritään luomaan katsaus tägidatan ja folksonomioiden analysointia varten kehitettyihin menetelmiin ja käyttämään kahta näistä valikoitua menetelmää ICT-alan termien analysointiin. Tavoitteena on löytää ja testata menetelmä tai menetelmiä, joiden avulla voidaan havaita ICT-alan…
Building and Testing a Comparative Interface on Northwest European Historical Parliamentary Debates : Relative Term Frequency Analysis of British Rep…
2022
Tensions between the people and parliament over representation are a normal feature of representative democracies. In this paper, we demonstrate how digital humanities analysis tools help in answering questions about the timing of debates on popular representation, tensions over its realization, and representatives’ changing perceptions on their parliamentary role. Our long-term approach to the conceptual history of political representation is based on the analysis of digitized parliamentary debates as nexuses of multi-sited political discourse. We combine computer-assisted distant and context-sensitive close reading to consider diachronic trends and synchronic political struggles surroundi…
Dynamic integration of data mining methods in knowledge discovery systems
2002
Tiedonlouhintateknologian hyödyntäminen urheiluvedonlyönnissä
2017
Tutkimus on systemaattisena kirjallisuuskatsauksena toteutettu kandidaatin tutkielma, jossa esitetyt määritelmät sekä löydetyt tulokset on saatu käyttäen lähdekirjallisuutena aiheesta aiemmin suoritettuja tutkimuksia. Tutkielman tarkoituksena on selvittää onko tiedonlouhintateknologiaa hyödyntämällä mahdollisuutta saavuttaa tuottoja urheiluvedonlyönnissä. Tutkielmassa esitellään vedonlyönnin teoriaa sekä määritellään alaan olennaisesti liittyvät käsitteet. Tämä tutkielma vastaa kaikkiaan kolmeen tutkimuskysymykseen: 1) Voidaanko vedonlyöntiä harjoittaa sijoitustoimintana? 2) Ovatko vedonlyöntimarkkinat tehokkaat? ja 3) Onko tiedonlouhintateknologiaa hyödyntämällä mahdollisuus saavuttaa tuot…
Information, Communications and Media Technologies for Sustainability: Constructing Data-Driven Policy Narratives
2021
This paper introduces the idea of data-driven narratives to examine how the use of information, communications, and media technologies (ICMTs) impacts the sustainable growth of economies. While ICMTs have regularly been advocated as a policy tool for growth and development, there is a research gap in empirical studies validating how such policies may be effective. This analysis is based on historical panel data from 39 economies across the developed North (19) and developing South (20). The industry-standard Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was applied to construct narratives that weave extant theories with empirical data. The art of developing data-dri…