Search results for "cluster analysis."
showing 10 items of 805 documents
Scalable Clustering by Iterative Partitioning and Point Attractor Representation
2016
Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…
Paradigm of tunable clustering using Binarization of Consensus Partition Matrices (Bi-CoPaM) for gene discovery
2013
Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight cluster…
Gravitational weighted fuzzy c-means with application on multispectral image segmentation
2014
This paper presents a novel clustering approach based on the classic Fuzzy c-means algorithm. The approach is inspired from the concept of interaction between objects in physics. Each data point is regarded as a particle. A specific weight is associated with each data particle depending on its interaction with other particles. This interaction is induced by attraction forces between pairs of particles and the escape velocity from other particles. Classification experiments using two data sets from UCI repository demonstrate the outperformance of the proposed approach over other clustering algorithms. In addition, results demonstrate the effectiveness of the proposed scheme for segmentation …
A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm
2015
The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
2017
Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…
Clustering techniques for personal photo album management
2009
In this work we propose a novel approach for the automatic representation of pictures achieving at more effective organization of personal photo albums. Images are analyzed and described in multiple representation spaces, namely, faces, background and time of capture. Faces are automatically detected, rectified and represented projecting the face itself in a common low-dimensional eigenspace. Backgrounds are represented with low-level visual features based on RGB histogram and Gabor filter bank. Faces, time and background information of each image in the collection is automatically organized using a mean-shift clustering technique. Given the particular domain of personal photo libraries, wh…
Gene Expression Analysis Uncovers Similarity and Differences Among Burkitt Lymphoma Subtypes.
2011
Abstract Abstract 2494 Background. Burkitt lymphoma (BL) is currently listed in the WHO classification of lymphoid tumors as a single genetic and morphological entity with variation in clinical presentation. In particular, three clinical subsets of BL are recognized: endemic (eBL), sporadic (sBL) and immunodeficiency associated (ID-BL). Each affects different populations and can present with different features. So far, possible differences in their gene expression profiles (GEP) have not been investigated. In this study we aimed to 1) assess whether BL subtypes present with differences in their GEP; 2) investigate the relationship of the different BL subtypes with the non-neoplastic cellula…
Analysis of the ORF2 of human astroviruses reveals lineage diversification, recombination and rearrangement and provides the basis for a novel sub-cl…
2014
Canonical human astroviruses (HAstVs) are important enteric pathogens that can be classified genetically and antigenically into eight types. Sequence analysis of small diagnostic regions at either the 5' or 3' end of ORF2 (capsid precursor) is a good proxy for prediction of HAstV types and for distinction of intratypic genetic lineages (subtypes), although lineage diversification/classification has not been investigated systematically. Upon sequence and phylogenetic analysis of the full-length ORF2 of 86 HAstV strains selected from the databases, a detailed classification of HAstVs into lineages was established. Three main lineages could be defined in HAstV-1, four in HAstV-2, two in HAstV-…
Disparity between Inter-Patient Molecular Heterogeneity and Repertoires of Target Drugs Used for Different Types of Cancer in Clinical Oncology
2020
Inter-patient molecular heterogeneity is the major declared driver of an expanding variety of anticancer drugs and personalizing their prescriptions. Here, we compared interpatient molecular heterogeneities of tumors and repertoires of drugs or their molecular targets currently in use in clinical oncology. We estimated molecular heterogeneity using genomic (whole exome sequencing) and transcriptomic (RNA sequencing) data for 4890 tumors taken from The Cancer Genome Atlas database. For thirteen major cancer types, we compared heterogeneities at the levels of mutations and gene expression with the repertoires of targeted therapeutics and their molecular targets accepted by the current guideli…
Exploring Multiobjective Optimization for Multiview Clustering
2018
We present a new multiview clustering approach based on multiobjective optimization. In contrast to existing clustering algorithms based on multiobjective optimization, it is generally applicable to data represented by two or more views and does not require specifying the number of clusters a priori . The approach builds upon the search capability of a multiobjective simulated annealing based technique, AMOSA, as the underlying optimization technique. In the first version of the proposed approach, an internal cluster validity index is used to assess the quality of different partitionings obtained using different views. A new way of checking the compatibility of these different partitioning…