Search results for "Machine learning"

showing 10 items of 1464 documents

Cluster-Localized Sparse Logistic Regression for SNP Data

2012

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…

Statistics and ProbabilityBoosting (machine learning)Computer scienceMultivariable calculusComputational BiologyHigh-Throughput Nucleotide SequencingFeature selectionRegression analysisModels TheoreticalLogistic regressioncomputer.software_genrePolymorphism Single NucleotideRegressionComputational MathematicsLogistic ModelsData Interpretation StatisticalGeneticsCluster AnalysisHumansData miningCluster analysisMolecular BiologyUnit-weighted regressioncomputerGenome-Wide Association StudyStatistical Applications in Genetics and Molecular Biology

researchProduct

Sparse relative risk regression models

2020

Summary Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios…

Statistics and ProbabilityClustering high-dimensional dataComputer sciencedgLARSInferenceScale (descriptive set theory)BiostatisticsMachine learningcomputer.software_genreRisk Assessment01 natural sciencesRegularization (mathematics)Relative risk regression model010104 statistics & probability03 medical and health sciencesNeoplasmsCovariateHumansComputer Simulation0101 mathematicsOnline Only ArticlesSurvival analysis030304 developmental biology0303 health sciencesModels Statisticalbusiness.industryLeast-angle regressionRegression analysisGeneral MedicineSurvival AnalysisHigh-dimensional dataGene expression dataRegression AnalysisArtificial intelligenceStatistics Probability and UncertaintySettore SECS-S/01 - StatisticabusinessSparsitycomputerBiostatistics

researchProduct

A fast and recursive algorithm for clustering large datasets with k-medians

2012

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…

Statistics and ProbabilityClustering high-dimensional dataFOS: Computer and information sciencesMathematical optimizationhigh dimensional dataMachine Learning (stat.ML)02 engineering and technologyStochastic approximation01 natural sciencesStatistics - Computation010104 statistics & probabilityk-medoidsStatistics - Machine Learning[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]stochastic approximation0202 electrical engineering electronic engineering information engineeringComputational statisticsrecursive estimatorsAlmost surely[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST]0101 mathematicsCluster analysisComputation (stat.CO)Mathematicsaveragingk-medoidsRobbins MonroApplied MathematicsEstimator[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH]stochastic gradient[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]MedoidComputational MathematicsComputational Theory and Mathematicsonline clustering020201 artificial intelligence & image processingpartitioning around medoidsAlgorithm

researchProduct

Correlated randomness and switching phenomena

2010

One challenge of biology, medicine, and economics is that the systems treated by these serious scientific disciplines have no perfect metronome in time and no perfect spatial architecture—crystalline or otherwise. Nonetheless, as if by magic, out of nothing but randomness one finds remarkably fine-tuned processes in time and remarkably fine-tuned structures in space. Further, many of these processes and structures have the remarkable feature of “switching” from one behavior to another as if by magic. The past century has, philosophically, been concerned with placing aside the human tendency to see the universe as a fine-tuned machine. Here we will address the challenge of uncovering how, th…

Statistics and ProbabilityCognitive scienceTheoretical physicsAsideNothingPhenomenonFeature (machine learning)Magic (programming)Space (commercial competition)Condensed Matter PhysicsTipping point (sociology)RandomnessMathematicsPhysica A: Statistical Mechanics and its Applications

researchProduct

Blind Source Separation Based on Joint Diagonalization in R: The Packages JADE and BSSasymp

2017

Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based es…

Statistics and ProbabilityComputer scienceJADE (programming language)02 engineering and technologyLatent variableMachine learningcomputer.software_genre01 natural sciencesBlind signal separation010104 statistics & probabilityMatrix (mathematics)nonstationary source separationMixing (mathematics)0202 electrical engineering electronic engineering information engineeringsecond order source separation0101 mathematicslcsh:Statisticslcsh:HA1-4737computer.programming_languageta113Signal processingta112matematiikkamultivariate time seriesmathematicsbusiness.industryEstimator020206 networking & telecommunicationsriippumattomien komponenttien analyysiindependent component analysis; multivariate time series; nonstationary source separation; performance indices; second order source separationIndependent component analysisperformance indicesstatisticsindependent component analysisArtificial intelligenceStatistics Probability and UncertaintybusinesscomputerAlgorithmSoftwareJournal of Statistical Software

researchProduct

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

2014

Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …

Statistics and ProbabilityComputer sciencePopulationPopulation basedMachine learningcomputer.software_genreBiochemistryProtein protein interaction networkgenetic algorithmsProtein–protein interactionBioinformatics Clustering Biological NetworksPPI networkscomplex detectionProtein Interaction MappingAnimalsCluster AnalysisHumanseducationCluster analysisMolecular BiologyTopology (chemistry)Class (computer programming)education.field_of_studybusiness.industryfood and beveragesProteinsComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsArtificial intelligenceData miningbusinessFocus (optics)computerAlgorithms

researchProduct

Overall Objective Priors

2015

In multi-parameter models, reference priors typically depend on the parameter or quantity of interest, and it is well known that this is necessary to produce objective posterior distributions with optimal properties. There are, however, many situations where one is simultaneously interested in all the parameters of the model or, more realistically, in functions of them that include aspects such as prediction, and it would then be useful to have a single objective prior that could safely be used to produce reasonable posterior inferences for all the quantities of interest. In this paper, we consider three methods for selecting a single objective prior and study, in a variety of problems incl…

Statistics and ProbabilityComputer sciencebusiness.industryApplied MathematicsMathematics - Statistics TheoryStatistics Theory (math.ST)Joint Reference PriorReference AnalysisMachine learningcomputer.software_genreLogarithmic DivergenceObjective PriorsVariety (cybernetics)Single objectiveMultinomial ModelPrior probabilityFOS: MathematicsMultinomial distributionMultinomial modelArtificial intelligencebusinesscomputerReference analysisBayesian Analysis

researchProduct

Sequential Monte Carlo methods in Bayesian joint models for longitudinal and time-to-event data

2020

The statistical analysis of the information generated by medical follow-up is a very important challenge in the field of personalized medicine. As the evolutionary course of a patient's disease progresses, his/her medical follow-up generates more and more information that should be processed immediately in order to review and update his/her prognosis and treatment. Hence, we focus on this update process through sequential inference methods for joint models of longitudinal and time-to-event data from a Bayesian perspective. More specifically, we propose the use of sequential Monte Carlo (SMC) methods for static parameter joint models with the intention of reducing computational time in each…

Statistics and ProbabilityComputer sciencebusiness.industryBayesian probabilitySequential monte carlo methodsMachine learningcomputer.software_genre01 natural sciencesField (computer science)010104 statistics & probability03 medical and health sciences0302 clinical medicineEvent data030220 oncology & carcinogenesisStatistical analysisPersonalized medicineArtificial intelligence0101 mathematicsStatistics Probability and UncertaintybusinessJoint (audio engineering)CartographycomputerStatistical Modelling

researchProduct

Binary distributions of concentric rings

2014

We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly spaced concentric rings. Kronecker product characterizations make them computationally attractive for a large number of variables. We study the behavior of different measures of dependence and derive maximum likelihood estimates when all nodes are observed and when the inner node is hidden.

Statistics and ProbabilityContingency tableKronecker productDiscrete mathematicsNumerical AnalysisBinary numberStar (graph theory)Combinatoricssymbols.namesakeConditional independenceJoint probability distributionsymbolsFeature (machine learning)Node (circuits)Statistics Probability and UncertaintyMathematicsJournal of Multivariate Analysis

researchProduct

Bayesian survival analysis with BUGS

2020

Survival analysis is one of the most important fields of statistics in medicine and biological sciences. In addition, the computational advances in the last decades have favored the use of Bayesian methods in this context, providing a flexible and powerful alternative to the traditional frequentist approach. The objective of this article is to summarize some of the most popular Bayesian survival models, such as accelerated failure time, proportional hazards, mixture cure, competing risks, multi-state, frailty, and joint models of longitudinal and survival data. Moreover, an implementation of each presented model is provided using a BUGS syntax that can be run with JAGS from the R programmin…

Statistics and ProbabilityFOS: Computer and information sciencesEpidemiologyComputer scienceBayesian probabilityContext (language use)Accelerated failure time modelMachine learningcomputer.software_genreBayesian inference01 natural sciencesStatistics - Applications010104 statistics & probability03 medical and health sciences0302 clinical medicineFrequentist inferenceHumansApplications (stat.AP)030212 general & internal medicine0101 mathematicsModels StatisticalSyntax (programming languages)business.industryR Programming LanguageBayes TheoremSurvival AnalysisMedical statisticsArtificial intelligencebusinesscomputer

researchProduct