Search results for "Clustering"

showing 10 items of 446 documents

Detection of spatial disease clusters with LISA functions.

2011

Detection of disease clusters is an important tool in epidemiology that can help to identify risk factors associated with the disease and in understanding its etiology. In this article we propose a method for the detection of spatial clusters where the locations of a set of cases and a set of controls are available. The method is based on local indicators of spatial association functions (LISA functions), particularly on the development of a local version of the product density, which is a second-order characteristic of spatial point processes. The behavior of the method is evaluated and compared with Kulldorff's spatial scan statistic by means of a simulation study. It is shown that the LI…

Statistics and ProbabilityAdultMaleDisease clustersEpidemiologyScan statisticIrregular shapePoint processDisease OutbreaksSet (abstract data type)StatisticsCluster AnalysisHumansComputer SimulationSensitivity (control systems)MathematicsAgedAged 80 and overbusiness.industryPattern recognitionMiddle AgedSpainData Interpretation StatisticalSpatial clusteringFemaleKidney DiseasesArtificial intelligencebusinessEpidemiologic MethodsType I and type II errorsStatistics in medicine
researchProduct

Sample size planning for survival prediction with focus on high-dimensional data

2011

Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low-dimensional and high-dimensional explanatory variables. For dimension reduction in the high-dimensional setting, a variable selection step is inserted. If not all informative variables are included…

Statistics and ProbabilityClustering high-dimensional dataClinical Trials as TopicLung NeoplasmsModels StatisticalKaplan-Meier EstimateEpidemiologyProportional hazards modelDimensionality reductionGene ExpressionFeature selectionKaplan-Meier EstimateBiostatisticsPrognosisBrier scoreSample size determinationCarcinoma Non-Small-Cell LungSample SizeCensoring (clinical trials)StatisticsHumansProportional Hazards ModelsMathematicsStatistics in Medicine
researchProduct

Sparse relative risk regression models

2020

Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…

Statistics and ProbabilityClustering high-dimensional dataComputer sciencedgLARSInferenceScale (descriptive set theory)BiostatisticsMachine learningcomputer.software_genreRisk Assessment01 natural sciencesRegularization (mathematics)Relative risk regression model010104 statistics & probability03 medical and health sciencesNeoplasmsCovariateHumansComputer Simulation0101 mathematicsOnline Only ArticlesSurvival analysis030304 developmental biology0303 health sciencesModels Statisticalbusiness.industryLeast-angle regressionRegression analysisGeneral MedicineSurvival AnalysisHigh-dimensional dataGene expression dataRegression AnalysisArtificial intelligenceStatistics Probability and UncertaintySettore SECS-S/01 - StatisticabusinessSparsitycomputerBiostatistics
researchProduct

A fast and recursive algorithm for clustering large datasets with k-medians

2012

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…

Statistics and ProbabilityClustering high-dimensional dataFOS: Computer and information sciencesMathematical optimizationhigh dimensional dataMachine Learning (stat.ML)02 engineering and technologyStochastic approximation01 natural sciencesStatistics - Computation010104 statistics & probabilityk-medoidsStatistics - Machine Learning[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]stochastic approximation0202 electrical engineering electronic engineering information engineeringComputational statisticsrecursive estimatorsAlmost surely[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST]0101 mathematicsCluster analysisComputation (stat.CO)Mathematicsaveragingk-medoidsRobbins MonroApplied MathematicsEstimator[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]stochastic gradient[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]MedoidComputational MathematicsComputational Theory and Mathematicsonline clustering020201 artificial intelligence & image processingpartitioning around medoidsAlgorithm
researchProduct

Modeling the coupled return-spread high frequency dynamics of large tick assets

2015

Large tick assets, i.e. assets where one tick movement is a significant fraction of the price and bid-ask spread is almost always equal to one tick, display a dynamics in which price changes and spread are strongly coupled. We introduce a Markov-switching modeling approach for price change, where the latent Markov process is the transition between spreads. We then use a finite Markov mixture of logit regressions on past squared returns to describe the dependence of the probability of price changes. The model can thus be seen as a Double Chain Markov Model. We show that the model describes the shape of return distribution at different time aggregations, volatility clustering, and the anomalo…

Statistics and ProbabilityComputer Science::Computer Science and Game TheoryVolatility clusteringQuantitative Finance - Trading and Market MicrostructureMarkov chainLogitMarkov processStatistical and Nonlinear PhysicsMarkov modelmodels of financial markets nonlinear dynamics stochastic processesTrading and Market Microstructure (q-fin.TR)FOS: Economics and businesssymbols.namesakesymbolsEconometricsKurtosisFraction (mathematics)Almost surelyStatistics Probability and Uncertainty60J20Mathematics
researchProduct

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

2014

Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …

Statistics and ProbabilityComputer sciencePopulationPopulation basedMachine learningcomputer.software_genreBiochemistryProtein protein interaction networkgenetic algorithmsProtein–protein interactionBioinformatics Clustering Biological NetworksPPI networkscomplex detectionProtein Interaction MappingAnimalsCluster AnalysisHumanseducationCluster analysisMolecular BiologyTopology (chemistry)Class (computer programming)education.field_of_studybusiness.industryfood and beveragesProteinsComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsArtificial intelligenceData miningbusinessFocus (optics)computerAlgorithms
researchProduct

Anthropometry: An R Package for Analysis of Anthropometric Data

2017

The development of powerful new 3D scanning techniques has enabled the generation of large up-to-date anthropometric databases which provide highly valued data to improve the ergonomic design of products adapted to the user population. As a consequence, Ergonomics and Anthropometry are two increasingly quantitative fields, so advanced statistical methodologies and modern software tools are required to get the maximum benefit from anthropometric data. This paper presents a new R package, called Anthropometry, which is available on the Comprehensive R Archive Network. It brings together some statistical methodologies concerning clustering, statistical shape analysis, statistical archetypal an…

Statistics and ProbabilityComputer sciencePopulationstatistical shape analysis02 engineering and technologycomputer.software_genre01 natural sciences010104 statistics & probabilitySoftware0202 electrical engineering electronic engineering information engineeringR; anthropometric data; clustering; statistical shape analysis; archetypal analysis; data depth0101 mathematicsarchetypal analysisCluster analysiseducationlcsh:Statisticslcsh:HA1-4737education.field_of_studyAnthropometric databusiness.industryStatistical shape analysisRHuman factors and ergonomicsAnthropometryanthropometric dataVignette020201 artificial intelligence & image processingData miningStatistics Probability and Uncertaintydata depthbusinesscomputerSoftwareclusteringJournal of Statistical Software
researchProduct

Migration and students' performance: detecting geographical differences following a curves clustering approach

2020

Students’ migration mobility is the new form of migration: students migrate to improve their skills and become more valued for the job market. The data regard the migration of Italian Bachelors who enrolled at Master Degree level, moving typically from poor to rich areas. This paper investigates the migration and other possible determinants on the Master Degree students’ performance. The Clustering of Effects approach for Quantile Regression Coefficients Modelling has been used to cluster the effects of some variables on the students’ performance for three Italian macro-areas. Results show evidence of similarity between Southern and Centre students, with respect to the Northern ones.

Statistics and ProbabilityComputingMilieux_THECOMPUTINGPROFESSIONApplication NotesComputer scienceClustering of curveeducationJob marketQuantile regressionCensored and truncated dataQuantile regressionComputingMilieux_COMPUTERSANDEDUCATIONEconometricsSettore SECS-S/05 - Statistica SocialeStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaCluster analysisStudents’performanceJournal of Applied Statistics
researchProduct

Ranking Scientific Journals Via Latent Class Models for Polytomous Item Response Data

2015

Summary We propose a model-based strategy for ranking scientific journals starting from a set of observed bibliometric indicators that represent imperfect measures of the unobserved ‘value’ of a journal. After discretizing the available indicators, we estimate an extended latent class model for polytomous item response data and use the estimated model to cluster journals. We illustrate our approach by using the data from the Italian research evaluation exercise that was carried out for the period 2004–2010, focusing on the set of journals that are considered relevant for the subarea statistics and financial mathematics. Using four bibliometric indicators (IF, IF5, AIS and the h-index), some…

Statistics and ProbabilityEconomics and EconometricEconomics and EconometricsClass (set theory)Research evaluationClusteringSet (abstract data type)Valutazione della Qualità delle RicercaCovariateStatisticsEconometricsFinite mixture modelsCluster analysisFinite mixture modelMathematicsGraded response modelMathematical financeItem response theory modelsItem response theory modelProbability and statisticsLatent class modelRankingStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaValutazione della Qualità delle Ricerca; Clustering; Finite mixture models; Graded response model; Item response theory models; Research evaluation;Social Sciences (miscellaneous)Journal of the Royal Statistical Society Series A: Statistics in Society
researchProduct

Bayesian Markov switching models for the early detection of influenza epidemics

2008

The early detection of outbreaks of diseases is one of the most challenging objectives of epidemiological surveillance systems. In this paper, a Markov switching model is introduced to determine the epidemic and non-epidemic periods from influenza surveillance data: the process of differenced incidence rates is modelled either with a first-order autoregressive process or with a Gaussian white-noise process depending on whether the system is in an epidemic or in a non-epidemic phase. The transition between phases of the disease is modelled as a Markovian process. Bayesian inference is carried out on the former model to detect influenza epidemics at the very moment of their onset. Moreover, t…

Statistics and ProbabilityEpidemiologyComputer scienceBayesian probabilityMarkov processBayesian inferenceDisease Outbreakssymbols.namesakeBayes' theoremStatisticsInfluenza HumanEconometricsHumansHidden Markov modelModels StatisticalMarkov chainIncidenceBayes TheoremMarkov ChainsMoment (mathematics)Autoregressive modelSpainSpace-Time ClusteringsymbolsRegression AnalysisSentinel Surveillance
researchProduct