Search results for "clusterin"
showing 10 items of 478 documents
Improvements and applications of the elements of prototype-based clustering
2018
Clustering or cluster analysis is an essential part of data mining, machine learning, and pattern recognition. The most popularly applied clustering methods are partitioning-based or prototype-based methods. Prototype-based clustering methods usually have easy implementability and good scalability. These methods, such as K-means clustering, have been used for different applications in various fields. On the other hand, prototype-based clustering methods are typically sensitive to initialization, and the selection of the number of clusters for knowledge discovery purposes is not straightforward. In the era of big data, in high-velocity, ever-growing datasets, which can also be erroneous, outl…
Kernel Feature Extraction Methods for Remote Sensing Data Analysis
2014
Technological advances in the last decades have improved our capabilities of collecting and storing high data volumes. However, this makes that in some fields, such as remote sensing several problems are generated in the data processing due to the peculiar characteristics of their data. High data volume, high dimensionality, heterogeneity and their nonlinearity, make that the analysis and extraction of relevant information from these images could be a bottleneck for many real applications. The research applying image processing and machine learning techniques along with feature extraction, allows the reduction of the data dimensionality while keeps the maximum information. Therefore, develo…
Weighted Clustering of Sparse Educational Data
2015
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated. pe…
SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm
2014
Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…
Detecting clusters in spatially correlated waveforms
2017
Seismic networks often record signals characterized by similar shapes that provide important information according to their geographic positions. We propose an approach to identify homogeneous clusters of seismic waves, combining analysis of waveforms with metadata and spectrogram information. In waveforms clustering, cross-correlation measures between signals may presents some limitations, so we refer to more recent contributes relating data-depth based clustering analysis. The mechanism for alignment is also an important topic of the analysis: warping (or aligning) procedures identify nuisance effects in phase variation, that, if ignored, may result in a possible loss of information and t…
Models and methods for space and space-time interactions in complex point processes with applications on earthquakes
Spatio-temporal Dynamical Analysis of Brain Activity during Mental Fatigue Process
2021
Mental fatigue is a common phenomenon with implicit and multidimensional properties. It brings dynamic changes of functional brain networks. However, the challenging problem of false positives appears when the connectivity is estimated by Electroencephalography (EEG). In this paper, we propose a novel framework based on spatial clustering to explore the sources of mental fatigue and functional activity changes caused by them. To suppress the false positive observations, spatial clustering is implemented in brain networks. The nodes extracted by spatial clustering are registered back to functional magnetic resonance imaging (fMRI) source space to determined the sources of mental fatigue. The…
Intelligent solutions for real-life data-driven applications
2017
The subject of this thesis belongs to the topic of machine learning or, specifically, to the development of advanced methods for regression analysis, clustering, and anomaly detection. Industry is constantly seeking improved production practices and minimized production time and costs. In connection to this, several industrial case studies are presented in which mathematical models for predicting paper quality were proposed. The most important variables for the prediction models are selected based on information-theoretic measures and regression trees approach. The rest of the original papers are devoted to unsupervised machine learning. The main focus is developing advanced spectral cluster…
An Examination of Tourist Arrivals Dynamics Using Short-Term Time Series Data: A Space—Time Cluster Approach
2013
The purpose of this study is to examine the development of Italian tourist areas ( circoscrizioni turistiche) through a cluster analysis of short time series. The technique is an adaptation of the functional data analysis approach developed by Abraham et al (2003), which combines spline interpolation with k-means clustering. The findings indicate the presence of two patterns (increasing and stable) averagely characterizing groups of territories. Moreover, tests of spatial contiguity suggest the presence of ‘space–time clusters’; that is, areas in the same ‘time cluster’ are also spatially contiguous. These findings appear to be more robust in particular for those series characterized by an…
Leveraging Users' Likes in a Video Streaming P2P Platform
2014
This paper investigates how a p2p television platform can take advantage of the presence of frequent channel viewers to grant them a more satisfying service than to less regular spectators. The idea we explore is to learn beforehand about the users' interests, in order to cluster them in groups that display different behaviors; then, the neighborhood creation strategy and video chunk scheduling algorithm of the overlay is altered, with the aim of serving frequent spectators in a privileged manner, providing them with a faster access to the selected channel without overly penalizing less habitual customers. An analytical model is developed, to capture the difference in startup delay that the…