Search results for "CLUSTER ANALYSIS"

showing 10 items of 848 documents

An interest rates cluster analysis

2004

An empirical analysis of interest rates in money and capital markets is performed. We investigate a set of 34 different weekly interest rate time series during a time period of 16 years between 1982 and 1997. Our study is focused on the collective behavior of the stochastic fluctuations of these time-series which is investigated by using a clustering linkage procedure. Without any a priori assumption, we individuate a meaningful separation in 6 main clusters organized in a hierarchical structure.

Statistics and ProbabilityCollective behaviormedia_common.quotation_subjectFOS: Physical sciencesLinkage (mechanical)computer.software_genrelaw.inventionFOS: Economics and businesslawEconometricsCluster (physics)Cluster analysisCondensed Matter - Statistical Mechanicsmedia_commonStatistical Finance (q-fin.ST)Statistical Mechanics (cond-mat.stat-mech)EconophysicsSeries (mathematics)Quantitative Finance - Statistical FinanceCondensed Matter PhysicsInterest rateCondensed Matter - Other Condensed MatterData miningCapital marketcomputerOther Condensed Matter (cond-mat.other)
researchProduct

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

2014

Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …

Statistics and ProbabilityComputer sciencePopulationPopulation basedMachine learningcomputer.software_genreBiochemistryProtein protein interaction networkgenetic algorithmsProtein–protein interactionBioinformatics Clustering Biological NetworksPPI networkscomplex detectionProtein Interaction MappingAnimalsCluster AnalysisHumanseducationCluster analysisMolecular BiologyTopology (chemistry)Class (computer programming)education.field_of_studybusiness.industryfood and beveragesProteinsComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsArtificial intelligenceData miningbusinessFocus (optics)computerAlgorithms
researchProduct

Anthropometry: An R Package for Analysis of Anthropometric Data

2017

The development of powerful new 3D scanning techniques has enabled the generation of large up-to-date anthropometric databases which provide highly valued data to improve the ergonomic design of products adapted to the user population. As a consequence, Ergonomics and Anthropometry are two increasingly quantitative fields, so advanced statistical methodologies and modern software tools are required to get the maximum benefit from anthropometric data. This paper presents a new R package, called Anthropometry, which is available on the Comprehensive R Archive Network. It brings together some statistical methodologies concerning clustering, statistical shape analysis, statistical archetypal an…

Statistics and ProbabilityComputer sciencePopulationstatistical shape analysis02 engineering and technologycomputer.software_genre01 natural sciences010104 statistics & probabilitySoftware0202 electrical engineering electronic engineering information engineeringR; anthropometric data; clustering; statistical shape analysis; archetypal analysis; data depth0101 mathematicsarchetypal analysisCluster analysiseducationlcsh:Statisticslcsh:HA1-4737education.field_of_studyAnthropometric databusiness.industryStatistical shape analysisRHuman factors and ergonomicsAnthropometryanthropometric dataVignette020201 artificial intelligence & image processingData miningStatistics Probability and Uncertaintydata depthbusinesscomputerSoftwareclusteringJournal of Statistical Software
researchProduct

DySC: software for greedy clustering of 16S rRNA reads.

2012

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact:  bertil.schmidt@uni-mainz.de Sup…

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftwareBioinformatics (Oxford, England)
researchProduct

Community detection algorithm evaluation with ground-truth data

2018

International audience; Community structure is of paramount importance for the understanding of complex networks. Consequently, there is a tremendous effort in order to develop efficient community detection algorithms. Unfortunately, the issue of a fair assessment of these algorithms is a thriving open question. If the ground-truth community structure is available, various clustering-based metrics are used in order to compare it versus the one discovered by these algorithms. However, these metrics defined at the node level are fairly insensitive to the variation of the overall community structure. To overcome these limitations, we propose to exploit the topological features of the ‘communit…

Statistics and ProbabilityComputer science‘Community-graph’Community structureVariation (game tree)[INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO]Complex networkCondensed Matter Physics01 natural sciencesGraph010305 fluids & plasmasCommunity structureSet (abstract data type)0103 physical sciencesNetwork analysis010306 general physicsCluster analysisAlgorithmNetwork analysis
researchProduct

Migration and students' performance: detecting geographical differences following a curves clustering approach

2020

Students’ migration mobility is the new form of migration: students migrate to improve their skills and become more valued for the job market. The data regard the migration of Italian Bachelors who enrolled at Master Degree level, moving typically from poor to rich areas. This paper investigates the migration and other possible determinants on the Master Degree students’ performance. The Clustering of Effects approach for Quantile Regression Coefficients Modelling has been used to cluster the effects of some variables on the students’ performance for three Italian macro-areas. Results show evidence of similarity between Southern and Centre students, with respect to the Northern ones.

Statistics and ProbabilityComputingMilieux_THECOMPUTINGPROFESSIONApplication NotesComputer scienceClustering of curveeducationJob marketQuantile regressionCensored and truncated dataQuantile regressionComputingMilieux_COMPUTERSANDEDUCATIONEconometricsSettore SECS-S/05 - Statistica SocialeStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaCluster analysisStudents’performanceJournal of Applied Statistics
researchProduct

MCRL: using a reference library to compress a metagenome into a non-redundant list of sequences, considering viruses as a case study

2019

Abstract Motivation Metagenomes offer a glimpse into the total genomic diversity contained within a sample. Currently, however, there is no straightforward way to obtain a non-redundant list of all putative homologs of a set of reference sequences present in a metagenome. Results To address this problem, we developed a novel clustering approach called ‘metagenomic clustering by reference library’ (MCRL), where a reference library containing a set of reference genes is clustered with respect to an assembled metagenome. According to our proposed approach, reference genes homologous to similar sets of metagenomic sequences, termed ‘signatures’, are iteratively clustered in a greedy fashion, re…

Statistics and ProbabilityContigComputer scienceRobustness (evolution)Computational biologyOriginal PapersBiochemistryComputer Science ApplicationsSet (abstract data type)Computational MathematicsComputational Theory and MathematicsMetagenomicsReference genesGene familyHuman viromeCluster analysisMolecular BiologyBioinformatics
researchProduct

Ranking Scientific Journals Via Latent Class Models for Polytomous Item Response Data

2015

Summary We propose a model-based strategy for ranking scientific journals starting from a set of observed bibliometric indicators that represent imperfect measures of the unobserved ‘value’ of a journal. After discretizing the available indicators, we estimate an extended latent class model for polytomous item response data and use the estimated model to cluster journals. We illustrate our approach by using the data from the Italian research evaluation exercise that was carried out for the period 2004–2010, focusing on the set of journals that are considered relevant for the subarea statistics and financial mathematics. Using four bibliometric indicators (IF, IF5, AIS and the h-index), some…

Statistics and ProbabilityEconomics and EconometricEconomics and EconometricsClass (set theory)Research evaluationClusteringSet (abstract data type)Valutazione della Qualità delle RicercaCovariateStatisticsEconometricsFinite mixture modelsCluster analysisFinite mixture modelMathematicsGraded response modelMathematical financeItem response theory modelsItem response theory modelProbability and statisticsLatent class modelRankingStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaValutazione della Qualità delle Ricerca; Clustering; Finite mixture models; Graded response model; Item response theory models; Research evaluation;Social Sciences (miscellaneous)Journal of the Royal Statistical Society Series A: Statistics in Society
researchProduct

Clustering of spatial point patterns

2006

Spatial point patterns arise as the natural sampling information in many problems. An ophthalmologic problem gave rise to the problem of detecting clusters of point patterns. A set of human corneal endothelium images is given. Each image is described by using a point pattern, the cell centroids. The main problem is to find groups of images corresponding with groups of spatial point patterns. This is interesting from a descriptive point of view and for clinical purposes. A new image can be compared with prototypes of each group and finally evaluated by the physician. Usual descriptors of spatial point patterns such as the empty-space function, the nearest distribution function or Ripley's K-…

Statistics and ProbabilityK-functionbusiness.industryApplied MathematicsCentroidPattern recognitionFunction (mathematics)Point processComputational MathematicsComputational Theory and MathematicsSurvival functionStatisticsPoint (geometry)Artificial intelligencePoint estimationCluster analysisbusinessMathematicsComputational Statistics & Data Analysis
researchProduct

Sparse kernel methods for high-dimensional survival data

2008

Abstract Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be ‘kernelized’. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, dependin…

Statistics and ProbabilityLung NeoplasmsLymphomaComputer sciencecomputer.software_genreComputing MethodologiesBiochemistryPattern Recognition AutomatedArtificial IntelligenceMargin (machine learning)CovariateCluster AnalysisHumansComputer SimulationFraction (mathematics)Molecular BiologyProportional Hazards ModelsModels StatisticalTraining setProportional hazards modelGene Expression ProfilingComputational BiologyComputer Science ApplicationsSupport vector machineComputational MathematicsKernel methodComputational Theory and MathematicsRegression AnalysisData miningcomputerAlgorithmsSoftwareBioinformatics
researchProduct