Search results for "cluster analysis."
showing 10 items of 805 documents
Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts
2005
As has been shown recently, it is possible to automatically discover the senses of an ambiguous word by statistically analyzing its contextual behavior in a large text corpus. However, this kind of research is still at an early stage. The results need to be improved and there is considerable disagreement on methodological issues. For example, although most researchers use clustering approaches for word sense induction, it is not clear what statistical features the clustering should be based on. Whereas so far most researchers cluster global co-occurrence vectors that reflect the overall behavior of a word in a corpus, in this paper we argue that it is more appropriate to use local context v…
Aspects Concerning SVM Method’s Scalability
2008
In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.
Graph Clustering with Local Density-Cut
2018
In this paper, we introduce a new graph clustering algorithm, called Dcut. The basic idea is to envision the graph clustering as a local density-cut problem. To identify meaningful communities in a graph, a density-connected tree is first constructed in a local fashion. Building upon the local intuitive density-connected tree, Dcut allows partitioning a graph into multiple densely tight-knit clusters effectively and efficiently. We have demonstrated that our method has several attractive benefits: (a) Dcut provides an intuitive criterion to evaluate the goodness of a graph clustering in a more precise way; (b) Building upon the density-connected tree, Dcut allows identifying high-quality cl…
Robust Synchronization-Based Graph Clustering
2013
Complex graph data now arises in various fields like social networks, protein-protein interaction networks, ecosystems, etc. To reveal the underlying patterns in graphs, an important task is to partition them into several meaningful clusters. The question is: how can we find the natural partitions of a complex graph which truly reflect the intrinsic patterns? In this paper, we propose RSGC, a novel approach to graph clustering. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. For each vertex, it is viewed as an oscillator and interacts with other vertices according to the graph connection information. During the process towards synchro…
Projector operators in clustering
2016
In a recent paper, the notion of quantum perceptron has been introduced in connection with projection operators. Here, we extend this idea, using these kind of operators to produce a clustering machine, that is, a framework that generates different clusters from a set of input data. Also, we consider what happens when the orthonormal bases first used in the definition of the projectors are replaced by frames and how these can be useful when trying to connect some noised signal to a given cluster. Copyright © 2016 John Wiley & Sons, Ltd.
The Burrows-Wheeler Transform between Data Compression and Combinatorics on Words
2013
The Burrows-Wheeler Transform (BWT) is a tool of fundamental importance in Data Compression and, recently, has found many applications well beyond its original purpose. The main goal of this paper is to highlight the mathematical and combinatorial properties on which the outstanding versatility of the $BWT$ is based, i.e. its reversibility and the clustering effect on the output. Such properties have aroused curiosity and fervent interest in the scientific world both for theoretical aspects and for practical effects. In particular, in this paper we are interested both to survey the theoretical research issues which, by taking their cue from Data Compression, have been developed in the conte…
Game of Thieves and WERW-Kpath: Two Novel Measures of Node and Edge Centrality for Mafia Networks
2021
Real-world complex systems can be modeled as homogeneous or heterogeneous graphs composed by nodes connected by edges. The importance of nodes and edges is formally described by a set of measures called centralities which are typically studied for graphs of small size. The proliferation of digital collection of data has led to huge graphs with billions of nodes and edges. For this reason, we focus on two new algorithms, Game of Thieves and WERW-Kpath which are computationally-light alternatives to the canonical centrality measures such as degree, node and edge betweenness, closeness and clustering. We explore the correlation among these measures using the Spearman’s correlation coefficient …
Online Induction of Probabilistic Real Time Automata
2012
Probabilistic real time automata (PRTAs) are a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded data sets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing…
An ontological-based knowledge organization for bioinformatics workflow management system
2012
Motivation and Objectives In the field of Computer Science, ontologies represent formal structures to define and organize knowledge of a specific application domain (Chandrasekaran et al., 1999). An ontology is composed of entities, called classes, and relationships among them. Classes are characterized by features, called attributes, and they can be arranged into a hierarchical organization. Ontologies are a fundamental instrument in Artificial Intelligence for the development of Knowledge-Based Systems (KBS). With its formal and well defined structure, in fact, an ontology provides a machine-understandable language that allows automatic reasoning for problems resolution. Typical KBS are E…
Functional Brain Segmentation Using Inter-Subject Correlation in fMRI
2016
The human brain continuously processes massive amounts of rich sensory information. To better understand such highly complex brain processes, modern neuroimaging studies are increasingly utilizing experimental setups that better mimic daily‐life situations. A new exploratory data‐analysis approach, functional segmentation inter‐subject correlation analysis (FuSeISC), was proposed to facilitate the analysis of functional magnetic resonance (fMRI) data sets collected in these experiments. The method provides a new type of functional segmentation of brain areas, not only characterizing areas that display similar processing across subjects but also areas in which processing across subjects is h…