Search results for "Clustering"

showing 10 items of 446 documents

Multilingual Clustering of Streaming News

2018

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …

FOS: Computer and information sciencesComputer Science - Computation and LanguageInformation retrievalComputer scienceInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technologyClusteringMedia MonitoringComputer Science - Information RetrievalComputingMethodologies_PATTERNRECOGNITIONMultilingual Methods0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisComputation and Language (cs.CL)Information Retrieval (cs.IR)

researchProduct

Towards Responsible AI for Financial Transactions

2020

Author's accepted manuscript. © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The application of AI in finance is increasingly dependent on the principles of responsible AI. These principles-explainability, fairness, privacy, accountability, transparency and soundness form the basis for trust in future AI systems. In this empirical study, we address the first p…

FOS: Computer and information sciencesComputer Science - Machine LearningComputer scienceComputer Science - Artificial IntelligenceDecision tree02 engineering and technologyMachine learningcomputer.software_genreMachine Learning (cs.LG)Empirical research020204 information systems0202 electrical engineering electronic engineering information engineeringRobustness (economics)Categorical variableVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550Soundnessbusiness.industryDocument clusteringTransparency (behavior)ComputingMethodologies_PATTERNRECOGNITIONArtificial Intelligence (cs.AI)Financial transaction020201 artificial intelligence & image processingArtificial intelligencebusinesscomputer

researchProduct

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

2019

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generaliz…

FOS: Computer and information sciencesComputer Science - Machine LearningMinimal Learning MachinekoneoppiminenStatistics - Machine Learninguniversal approximationMachine Learning (stat.ML)interpolointiapproksimointireference point selectionclusteringMachine Learning (cs.LG)

researchProduct

Diffusion map for clustering fMRI spatial maps extracted by Indipendent Component Analysis

2013

Functional magnetic resonance imaging (fMRI) produces data about activity inside the brain, from which spatial maps can be extracted by independent component analysis (ICA). In datasets, there are n spatial maps that contain p voxels. The number of voxels is very high compared to the number of analyzed spatial maps. Clustering of the spatial maps is usually based on correlation matrices. This usually works well, although such a similarity matrix inherently can explain only a certain amount of the total variance contained in the high-dimensional data where n is relatively small but p is large. For high-dimensional space, it is reasonable to perform dimensionality reduction before clustering.…

FOS: Computer and information sciencesDiffusion (acoustics)Computer sciencediffusion mapMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreMachine Learning (cs.LG)Computational Engineering Finance and Science (cs.CE)Correlation03 medical and health sciencesTotal variation0302 clinical medicineStatistics - Machine LearningVoxel0202 electrical engineering electronic engineering information engineeringComputer Science - Computational Engineering Finance and ScienceCluster analysisdimensionality reductionta113spatial mapsbusiness.industryDimensionality reductionfunctional magnetic resonance imaging (fMRI)Pattern recognitionIndependent component analysisSpectral clusteringComputer Science - Learningindependent component analysista6131020201 artificial intelligence & image processingArtificial intelligenceDYNAMICAL-SYSTEMSbusinesscomputer030217 neurology & neurosurgeryclustering

researchProduct

An Empirical Study of the Relation Between Community Structure and Transitivity

2012

One of the most prominent properties in real-world networks is the presence of a community structure, i.e. dense and loosely interconnected groups of nodes called communities. In an attempt to better understand this concept, we study the relationship between the strength of the community structure and the network transitivity (or clustering coefficient). Although intuitively appealing, this analysis was not performed before. We adopt an approach based on random models to empirically study how one property varies depending on the other. It turns out the transitivity increases with the community structure strength, and is also affected by the distribution of the community sizes. Furthermore, …

FOS: Computer and information sciencesPhysics - Physics and SocietyProperty (philosophy)FOS: Physical sciencesPhysics and Society (physics.soc-ph)[ INFO.INFO-CV ] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]01 natural sciencesComplex NetworksClustering010305 fluids & plasmasEmpirical research0103 physical sciences010306 general physicstransitivityCommunity StructureClustering coefficientMathematicsSocial and Information Networks (cs.SI)Transitive relationCommunity structure[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]Computer Science - Social and Information NetworksComplex networkDegree distributionZero (linguistics)Mathematical economics

researchProduct

Fast PET Scan Tumor Segmentation Using Superpixels, Principal Component Analysis and K-Means Clustering

2018

Positron Emission Tomography scan images are extensively used in radiotherapy planning, clinical diagnosis, assessment of growth and treatment of a tumor. These all rely on fidelity and speed of detection and delineation algorithm. Despite intensive research, segmentation remained a challenging problem due to the diverse image content, resolution, shape, and noise. This paper presents a fast positron emission tomography tumor segmentation method in which superpixels are extracted first from the input image. Principal component analysis is then applied on the superpixels and also on their average. Distance vector of each superpixel from the average is computed in principal components coordin…

FOS: Computer and information sciencespositron emission tomographyprincipal component analysisComputer scienceComputer Vision and Pattern Recognition (cs.CV)k-meansCoordinate systemComputer Science - Computer Vision and Pattern RecognitionFOS: Physical sciences02 engineering and technologyBenchmarkQuantitative Biology - Quantitative MethodsBiochemistry Genetics and Molecular Biology (miscellaneous)030218 nuclear medicine & medical imagingsuperpixels03 medical and health sciences0302 clinical medicineStructural Biology0202 electrical engineering electronic engineering information engineeringmedicineSegmentationComputer visionTissues and Organs (q-bio.TO)Cluster analysisQuantitative Methods (q-bio.QM)Pixelmedicine.diagnostic_testbusiness.industrysegmentationk-means clusteringQuantitative Biology - Tissues and OrgansPattern recognitionPhysics - Medical PhysicsPositron emission tomographyFOS: Biological sciencesPhysics - Data Analysis Statistics and ProbabilityPrincipal component analysis020201 artificial intelligence & image processingMedical Physics (physics.med-ph)Artificial intelligenceNoise (video)businessData Analysis Statistics and Probability (physics.data-an)BiotechnologyMethods and Protocols

researchProduct

Clustering of waveforms-data based on FPCA direction

2010

The necessity of nding similar features of waveforms data recorded for earthquakes at di erent time instants is here considered, since eventual similarity between these functions could suggest similar behavior of the source process of the corresponding earthquakes. In this paper we develop a clustering algorithm for curves based on directions de ned by an application of PCA to functional data.

FPCA clustering of curves waveformsSettore SECS-S/01 - Statistica

researchProduct

Clustering of waveforms based on FPCA direction

2010

Looking for curves similarity could be a complex issue characterized by subjective choices related to continuous transformations of observed discrete data (Chiodi, 1989). Waveforms correlation techniques have been introduced to charac- terize the degree of seismic event similarity (Menke, 1999) and in facilitating more accurate relative locations within similar event clusters by providing more precise timing of seismic wave (P and S) arrivals (Phillips, 1997). In this paper functional analysis (Ramsey, and Silverman, 2006) is considered to highlight common characteristics of waveforms-data and to summarize these charac- teristics by few components, by applying a variant of a classical clust…

FPCA clustering of curves waveformsSettore SECS-S/01 - Statistica

researchProduct

Space-time FPCA Algorithm for clustering of multidimensional curves.

2016

In this paper we focus on finding clusters of multidimensional curves with spatio-temporal structure, applying a variant of a k-means algorithm based on the principal component rotation of data. The main advantage of this approach is to combine the clustering functional analysis of the multidimensional data, with smoothing methods based on generalized additive models, that cope with both the spatial and the temporal variability, and with functional principal components that takes into account the dependency between the curves.

FPCA clustering of multidimensional curves GAM spatio-temporal pattern

researchProduct

Functional Principal components direction to cluster earthquake waveforms

2010

Looking for curves similarity could be a complex issue characterized by subjective choices related to continuous transformations of observed discrete data (Chiodi, 1989). In this paper we combine the aim of finding clusters from a set of individual curves to the functional nature of data, applying a variant of a k-means algorithm based on the principal component rotation of data. We apply a classical clustering method to rotated data, according to the direction of maximum variance. A k-means clustering algorithm based on PCA rotation of data is proposed, as an alternative to methods that require previous interpolation of data based on splines or linear fitting (Garc´ıa- Escudero and Gordali…

FPCA waveforms clustering approachSettore SECS-S/01 - Statistica

researchProduct