Search results for "Correlation clustering"

showing 10 items of 28 documents

Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting.

2015

De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clusterin…

Computer scienceCorrelation clusteringSingle-linkage clusteringMolecular Sequence DataMachine learningcomputer.software_genrePattern Recognition AutomatedCURE data clustering algorithmRNA Ribosomal 16SGeneticsComputer GraphicsCluster analysisBase Sequencebusiness.industryApplied MathematicsDendrogramHigh-Throughput Nucleotide SequencingPattern recognitionSignal Processing Computer-AssistedEquipment DesignHierarchical clusteringEquipment Failure AnalysisRNA BacterialCanopy clustering algorithmArtificial intelligenceHierarchical clustering of networksbusinesscomputerSequence AlignmentAlgorithmsBiotechnologyIEEE/ACM transactions on computational biology and bioinformatics
researchProduct

Clustering categorical data: A stability analysis framework

2011

Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation …

Computer sciencebusiness.industrySingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreMachine learningDetermining the number of clusters in a data setData stream clusteringCURE data clustering algorithmConsensus clusteringData miningArtificial intelligenceCluster analysisbusinesscomputer2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
researchProduct

Mammographic images segmentation based on chaotic map clustering algorithm

2013

Background: This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods: The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads…

Cooperative behaviorClustering algorithmsComputer scienceFeature vectorCorrelation clusteringPhysics::Medical PhysicsMass lesionsMicrocalcificationsImage processingBreast NeoplasmsDigital imageSegmentationBreast cancerImage Processing Computer-AssistedCluster AnalysisHumansRadiology Nuclear Medicine and imagingSegmentationComputer visionCluster analysisFeaturesPixelChaotic maps Clustering algorithms Cooperative behavior Segmentation Mammography Features Mass lesions Microcalcifications Breast cancerbusiness.industrySegmentation-based object categorizationCalcinosisSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Radiographic Image EnhancementChaotic mapsRadiology Nuclear Medicine and imagingComputer Science::Computer Vision and Pattern RecognitionFemaleArtificial intelligencebusinessAlgorithmsMammographyResearch Article
researchProduct

Distance-constrained data clustering by combined k-means algorithms and opinion dynamics filters

2014

Data clustering algorithms represent mechanisms for partitioning huge arrays of multidimensional data into groups with small in–group and large out–group distances. Most of the existing algorithms fail when a lower bound for the distance among cluster centroids is specified, while this type of constraint can be of help in obtaining a better clustering. Traditional approaches require that the desired number of clusters are specified a priori, which requires either a subjective decision or global meta–information knowledge that is not easily obtainable. In this paper, an extension of the standard data clustering problem is addressed, including additional constraints on the cluster centroid di…

Fuzzy clusteringCorrelation clusteringSingle-linkage clusteringConstrained clusteringcomputer.software_genreDetermining the number of clusters in a data setSettore ING-INF/04 - AutomaticaData clustering k–means Opinion dynamics Hegelsmann–Krause modelCURE data clustering algorithmData miningCluster analysisAlgorithmcomputerk-medians clusteringMathematics22nd Mediterranean Conference on Control and Automation
researchProduct

Scalable Clustering by Iterative Partitioning and Point Attractor Representation

2016

Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…

Fuzzy clusteringGeneral Computer ScienceComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clustering02 engineering and technologycomputer.software_genreComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithm020204 information systems0202 electrical engineering electronic engineering information engineeringCanopy clustering algorithm020201 artificial intelligence & image processingData miningCluster analysiscomputerACM Transactions on Knowledge Discovery from Data
researchProduct

Gravitational weighted fuzzy c-means with application on multispectral image segmentation

2014

This paper presents a novel clustering approach based on the classic Fuzzy c-means algorithm. The approach is inspired from the concept of interaction between objects in physics. Each data point is regarded as a particle. A specific weight is associated with each data particle depending on its interaction with other particles. This interaction is induced by attraction forces between pairs of particles and the escape velocity from other particles. Classification experiments using two data sets from UCI repository demonstrate the outperformance of the proposed approach over other clustering algorithms. In addition, results demonstrate the effectiveness of the proposed scheme for segmentation …

Fuzzy clusteringSegmentation-based object categorizationbusiness.industryCorrelation clusteringScale-space segmentationPattern recognitionSegmentationImage segmentationArtificial intelligenceCluster analysisbusinessFuzzy logicMathematics2014 4th International Conference on Image Processing Theory, Tools and Applications (IPTA)
researchProduct

A Novel Clustering Algorithm based on a Non-parametric "Anti-Bayesian" Paradigm

2015

The problem of clustering, or unsupervised classification, has been solved by a myriad of techniques, all of which depend, either directly or implicitly, on the Bayesian principle of optimal classification. To be more specific, within a Bayesian paradigm, if one is to compare the testing sample with only a single point in the feature space from each class, the optimal Bayesian strategy would be to achieve this based on the distance from the corresponding means or central points in the respective distributions. When this principle is applied in clustering, one would assign an unassigned sample into the cluster whose mean is the closest, and this can be done in either a bottom-up or a top-dow…

Fuzzy clusteringbusiness.industryComputer scienceCorrelation clusteringConstrained clusteringPattern recognitioncomputer.software_genreData stream clusteringCURE data clustering algorithmCanopy clustering algorithmAffinity propagationArtificial intelligenceData miningbusinessCluster analysiscomputer
researchProduct

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

2017

Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…

Fuzzy clusteringlcsh:T55.4-60.8Computer scienceSingle-linkage clusteringCorrelation clustering02 engineering and technologycomputer.software_genrelcsh:QA75.5-76.95Theoretical Computer Scienceprototype-based clusteringCURE data clustering algorithm020204 information systemsprototype-based clustering; clustering validation index; robust statisticsConsensus clusteringalgoritmit0202 electrical engineering electronic engineering information engineeringlcsh:Industrial engineering. Management engineeringCluster analysisk-medians clusteringta113Numerical Analysisbusiness.industryPattern recognitionDetermining the number of clusters in a data setComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and Mathematicsrobust statistics020201 artificial intelligence & image processinglcsh:Electronic computers. Computer scienceArtificial intelligenceData miningtiedonlouhintabusinessclustering validation indexcomputerAlgorithms
researchProduct

Feature Ranking of Large, Robust, and Weighted Clustering Result

2017

A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…

Kruskal-Wallis testComputer scienceCorrelation clusteringPopulation02 engineering and technologycomputer.software_genreMachine learning01 natural sciencesRanking (information retrieval)010104 statistics & probabilityKnowledge extractionCURE data clustering algorithmpopulation analysisRanking SVM0202 electrical engineering electronic engineering information engineeringTest statistic0101 mathematicseducational knowledge discoveryeducationCluster analysiseducation.field_of_studybusiness.industryRanking020201 artificial intelligence & image processingData miningArtificial intelligencerobust clusteringbusinesscomputer
researchProduct

A Clustering Approach to texture Classification

1988

In the paper a clustering technique to segment an image in to “homogeneous” regions is studied. The homogeneity of each region is evaluated by means of a “proximity function” computed between the pixels. The main result of such approach is that no-histogramming is required in order to perform segmentation. Possibilistic and probabilistic approaches are, also, combined to evaluate the significativity of the computed regions.

PixelComputer sciencebusiness.industryFeature vectorHomogeneity (statistics)Correlation clusteringComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONProbabilistic logicPattern recognitionImage textureComputer Science::Computer Vision and Pattern RecognitionSegmentationArtificial intelligenceCluster analysisbusiness
researchProduct