Search results for "cluster analysis."

showing 10 items of 805 documents

Detection of spatial disease clusters with LISA functions.

2011

Detection of disease clusters is an important tool in epidemiology that can help to identify risk factors associated with the disease and in understanding its etiology. In this article we propose a method for the detection of spatial clusters where the locations of a set of cases and a set of controls are available. The method is based on local indicators of spatial association functions (LISA functions), particularly on the development of a local version of the product density, which is a second-order characteristic of spatial point processes. The behavior of the method is evaluated and compared with Kulldorff's spatial scan statistic by means of a simulation study. It is shown that the LI…

Statistics and ProbabilityAdultMaleDisease clustersEpidemiologyScan statisticIrregular shapePoint processDisease OutbreaksSet (abstract data type)StatisticsCluster AnalysisHumansComputer SimulationSensitivity (control systems)MathematicsAgedAged 80 and overbusiness.industryPattern recognitionMiddle AgedSpainData Interpretation StatisticalSpatial clusteringFemaleKidney DiseasesArtificial intelligencebusinessEpidemiologic MethodsType I and type II errorsStatistics in medicine
researchProduct

Using mathematical morphology for unsupervised classification of functional data

2011

This paper is concerned with the unsupervised classification of functional data by using mathematical morphology. Different morphological operators are used to extract relevant structures of the functions (considered as sets through their subgraph representations). These operators can be considered as preprocessing tools whose outputs are also functional data. We explore some dissimilarity measures and clustering methods for the classification of the transformed data. Our approach is illustrated through a detailed analysis of two data sets. These techniques, which have mainly been used in image processing, provide a flexible and robust toolbox for improving the results in unsupervised funct…

Statistics and ProbabilityApplied MathematicsData classificationImage processingMathematical morphologycomputer.software_genreToolboxComputingMethodologies_PATTERNRECOGNITIONModeling and SimulationPreprocessorData miningStatistics Probability and UncertaintyCluster analysisMorphological operatorscomputerMathematicsJournal of Statistical Computation and Simulation
researchProduct

Cluster-Localized Sparse Logistic Regression for SNP Data

2012

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…

Statistics and ProbabilityBoosting (machine learning)Computer scienceMultivariable calculusComputational BiologyHigh-Throughput Nucleotide SequencingFeature selectionRegression analysisModels TheoreticalLogistic regressioncomputer.software_genrePolymorphism Single NucleotideRegressionComputational MathematicsLogistic ModelsData Interpretation StatisticalGeneticsCluster AnalysisHumansData miningCluster analysisMolecular BiologyUnit-weighted regressioncomputerGenome-Wide Association StudyStatistical Applications in Genetics and Molecular Biology
researchProduct

A fast and recursive algorithm for clustering large datasets with k-medians

2012

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…

Statistics and ProbabilityClustering high-dimensional dataFOS: Computer and information sciencesMathematical optimizationhigh dimensional dataMachine Learning (stat.ML)02 engineering and technologyStochastic approximation01 natural sciencesStatistics - Computation010104 statistics & probabilityk-medoidsStatistics - Machine Learning[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]stochastic approximation0202 electrical engineering electronic engineering information engineeringComputational statisticsrecursive estimatorsAlmost surely[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST]0101 mathematicsCluster analysisComputation (stat.CO)Mathematicsaveragingk-medoidsRobbins MonroApplied MathematicsEstimator[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]stochastic gradient[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]MedoidComputational MathematicsComputational Theory and Mathematicsonline clustering020201 artificial intelligence & image processingpartitioning around medoidsAlgorithm
researchProduct

An interest rates cluster analysis

2004

An empirical analysis of interest rates in money and capital markets is performed. We investigate a set of 34 different weekly interest rate time series during a time period of 16 years between 1982 and 1997. Our study is focused on the collective behavior of the stochastic fluctuations of these time-series which is investigated by using a clustering linkage procedure. Without any a priori assumption, we individuate a meaningful separation in 6 main clusters organized in a hierarchical structure.

Statistics and ProbabilityCollective behaviormedia_common.quotation_subjectFOS: Physical sciencesLinkage (mechanical)computer.software_genrelaw.inventionFOS: Economics and businesslawEconometricsCluster (physics)Cluster analysisCondensed Matter - Statistical Mechanicsmedia_commonStatistical Finance (q-fin.ST)Statistical Mechanics (cond-mat.stat-mech)EconophysicsSeries (mathematics)Quantitative Finance - Statistical FinanceCondensed Matter PhysicsInterest rateCondensed Matter - Other Condensed MatterData miningCapital marketcomputerOther Condensed Matter (cond-mat.other)
researchProduct

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

2014

Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …

Statistics and ProbabilityComputer sciencePopulationPopulation basedMachine learningcomputer.software_genreBiochemistryProtein protein interaction networkgenetic algorithmsProtein–protein interactionBioinformatics Clustering Biological NetworksPPI networkscomplex detectionProtein Interaction MappingAnimalsCluster AnalysisHumanseducationCluster analysisMolecular BiologyTopology (chemistry)Class (computer programming)education.field_of_studybusiness.industryfood and beveragesProteinsComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsArtificial intelligenceData miningbusinessFocus (optics)computerAlgorithms
researchProduct

Anthropometry: An R Package for Analysis of Anthropometric Data

2017

The development of powerful new 3D scanning techniques has enabled the generation of large up-to-date anthropometric databases which provide highly valued data to improve the ergonomic design of products adapted to the user population. As a consequence, Ergonomics and Anthropometry are two increasingly quantitative fields, so advanced statistical methodologies and modern software tools are required to get the maximum benefit from anthropometric data. This paper presents a new R package, called Anthropometry, which is available on the Comprehensive R Archive Network. It brings together some statistical methodologies concerning clustering, statistical shape analysis, statistical archetypal an…

Statistics and ProbabilityComputer sciencePopulationstatistical shape analysis02 engineering and technologycomputer.software_genre01 natural sciences010104 statistics & probabilitySoftware0202 electrical engineering electronic engineering information engineeringR; anthropometric data; clustering; statistical shape analysis; archetypal analysis; data depth0101 mathematicsarchetypal analysisCluster analysiseducationlcsh:Statisticslcsh:HA1-4737education.field_of_studyAnthropometric databusiness.industryStatistical shape analysisRHuman factors and ergonomicsAnthropometryanthropometric dataVignette020201 artificial intelligence & image processingData miningStatistics Probability and Uncertaintydata depthbusinesscomputerSoftwareclusteringJournal of Statistical Software
researchProduct

DySC: software for greedy clustering of 16S rRNA reads.

2012

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact:  bertil.schmidt@uni-mainz.de Sup…

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftwareBioinformatics (Oxford, England)
researchProduct

Community detection algorithm evaluation with ground-truth data

2018

International audience; Community structure is of paramount importance for the understanding of complex networks. Consequently, there is a tremendous effort in order to develop efficient community detection algorithms. Unfortunately, the issue of a fair assessment of these algorithms is a thriving open question. If the ground-truth community structure is available, various clustering-based metrics are used in order to compare it versus the one discovered by these algorithms. However, these metrics defined at the node level are fairly insensitive to the variation of the overall community structure. To overcome these limitations, we propose to exploit the topological features of the ‘communit…

Statistics and ProbabilityComputer science‘Community-graph’Community structureVariation (game tree)[INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO]Complex networkCondensed Matter Physics01 natural sciencesGraph010305 fluids & plasmasCommunity structureSet (abstract data type)0103 physical sciencesNetwork analysis010306 general physicsCluster analysisAlgorithmNetwork analysis
researchProduct

Migration and students' performance: detecting geographical differences following a curves clustering approach

2020

Students’ migration mobility is the new form of migration: students migrate to improve their skills and become more valued for the job market. The data regard the migration of Italian Bachelors who enrolled at Master Degree level, moving typically from poor to rich areas. This paper investigates the migration and other possible determinants on the Master Degree students’ performance. The Clustering of Effects approach for Quantile Regression Coefficients Modelling has been used to cluster the effects of some variables on the students’ performance for three Italian macro-areas. Results show evidence of similarity between Southern and Centre students, with respect to the Northern ones.

Statistics and ProbabilityComputingMilieux_THECOMPUTINGPROFESSIONApplication NotesComputer scienceClustering of curveeducationJob marketQuantile regressionCensored and truncated dataQuantile regressionComputingMilieux_COMPUTERSANDEDUCATIONEconometricsSettore SECS-S/05 - Statistica SocialeStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaCluster analysisStudents’performanceJournal of Applied Statistics
researchProduct