Search results for "Clustering"
showing 10 items of 446 documents
Multilingual Clustering of Streaming News
2018
Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …
Towards Responsible AI for Financial Transactions
2020
Author's accepted manuscript. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The application of AI in finance is increasingly dependent on the principles of responsible AI. These principles-explainability, fairness, privacy, accountability, transparency and soundness form the basis for trust in future AI systems. In this empirical study, we address the first p…
Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
2019
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generaliz…
Diffusion map for clustering fMRI spatial maps extracted by Indipendent Component Analysis
2013
Functional magnetic resonance imaging (fMRI) produces data about activity inside the brain, from which spatial maps can be extracted by independent component analysis (ICA). In datasets, there are n spatial maps that contain p voxels. The number of voxels is very high compared to the number of analyzed spatial maps. Clustering of the spatial maps is usually based on correlation matrices. This usually works well, although such a similarity matrix inherently can explain only a certain amount of the total variance contained in the high-dimensional data where n is relatively small but p is large. For high-dimensional space, it is reasonable to perform dimensionality reduction before clustering.…
An Empirical Study of the Relation Between Community Structure and Transitivity
2012
One of the most prominent properties in real-world networks is the presence of a community structure, i.e. dense and loosely interconnected groups of nodes called communities. In an attempt to better understand this concept, we study the relationship between the strength of the community structure and the network transitivity (or clustering coefficient). Although intuitively appealing, this analysis was not performed before. We adopt an approach based on random models to empirically study how one property varies depending on the other. It turns out the transitivity increases with the community structure strength, and is also affected by the distribution of the community sizes. Furthermore, …
Fast PET Scan Tumor Segmentation Using Superpixels, Principal Component Analysis and K-Means Clustering
2018
Positron Emission Tomography scan images are extensively used in radiotherapy planning, clinical diagnosis, assessment of growth and treatment of a tumor. These all rely on fidelity and speed of detection and delineation algorithm. Despite intensive research, segmentation remained a challenging problem due to the diverse image content, resolution, shape, and noise. This paper presents a fast positron emission tomography tumor segmentation method in which superpixels are extracted first from the input image. Principal component analysis is then applied on the superpixels and also on their average. Distance vector of each superpixel from the average is computed in principal components coordin…
Clustering of waveforms-data based on FPCA direction
2010
The necessity of nding similar features of waveforms data recorded for earthquakes at di erent time instants is here considered, since eventual similarity between these functions could suggest similar behavior of the source process of the corresponding earthquakes. In this paper we develop a clustering algorithm for curves based on directions de ned by an application of PCA to functional data.
Clustering of waveforms based on FPCA direction
2010
Looking for curves similarity could be a complex issue characterized by subjective choices related to continuous transformations of observed discrete data (Chiodi, 1989). Waveforms correlation techniques have been introduced to charac- terize the degree of seismic event similarity (Menke, 1999) and in facilitating more accurate relative locations within similar event clusters by providing more precise timing of seismic wave (P and S) arrivals (Phillips, 1997). In this paper functional analysis (Ramsey, and Silverman, 2006) is considered to highlight common characteristics of waveforms-data and to summarize these charac- teristics by few components, by applying a variant of a classical clust…
Space-time FPCA Algorithm for clustering of multidimensional curves.
2016
In this paper we focus on finding clusters of multidimensional curves with spatio-temporal structure, applying a variant of a k-means algorithm based on the principal component rotation of data. The main advantage of this approach is to combine the clustering functional analysis of the multidimensional data, with smoothing methods based on generalized additive models, that cope with both the spatial and the temporal variability, and with functional principal components that takes into account the dependency between the curves.
Functional Principal components direction to cluster earthquake waveforms
2010
Looking for curves similarity could be a complex issue characterized by subjective choices related to continuous transformations of observed discrete data (Chiodi, 1989). In this paper we combine the aim of finding clusters from a set of individual curves to the functional nature of data, applying a variant of a k-means algorithm based on the principal component rotation of data. We apply a classical clustering method to rotated data, according to the direction of maximum variance. A k-means clustering algorithm based on PCA rotation of data is proposed, as an alternative to methods that require previous interpolation of data based on splines or linear fitting (Garc´ıa- Escudero and Gordali…