Search results for "feature"

showing 10 items of 4091 documents

Second-order diagnostics for space-time point processes with application to seismic events

2008

A diagnostic method for space-time point process is introduced and used to interpret and assess the goodness of fit of particular models to real data such as the seismic ones. The proposed method is founded on the definition of a weighted process and allows to detect second-order features of data, like long-range dependence and fractal behavior, that are not accounted for by the fitted model. Applications to earthquake data are provided. Copyright © 2008 John Wiley & Sons, Ltd.

Statistics and ProbabilityDiagnostic methodsComputer scienceEcological ModelingSpace timeProcess (computing)ResidualPoint processFractalGoodness of fitOrder (business)EconometricsSettore SECS-S/01 - StatisticaAlgorithmPoint processes residual analysis second-order features ETAS model seismic processEnvironmetrics
researchProduct

Tailoring sparse multivariable regression techniques for prognostic single-nucleotide polymorphism signatures.

2011

When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single-nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illus…

Statistics and ProbabilityEpidemiologyComputer scienceFeature selectionBiostatisticscomputer.software_genrePolymorphism Single NucleotideLasso (statistics)Gene FrequencyResamplingCovariateHumansLikelihood FunctionsModels StatisticalMultivariable calculusRegression analysisGenomicsPrognosisRegressionMinor allele frequencyLeukemia Myeloid AcuteMultivariate AnalysisRegression AnalysisData miningcomputerAlgorithmsStatistics in medicine
researchProduct

Methods and Tools for Bayesian Variable Selection and Model Averaging in Normal Linear Regression

2018

In this paper, we briefly review the main methodological aspects concerned with the application of the Bayesian approach to model choice and model averaging in the context of variable selection in regression models. This includes prior elicitation, summaries of the posterior distribution and computational strategies. We then examine and compare various publicly available R-packages, summarizing and explaining the differences between packages and giving recommendations for applied users. We find that all packages reviewed (can) lead to very similar results, but there are potentially important differences in flexibility and efficiency of the packages.

Statistics and ProbabilityGeneral linear modelProper linear modelbusiness.industryComputer science05 social sciencesPosterior probabilityRegression analysisFeature selectionMachine learningcomputer.software_genre01 natural sciences010104 statistics & probabilityBayesian multivariate linear regression0502 economics and businessLinear regressionEconometricsArtificial intelligence050207 economics0101 mathematicsStatistics Probability and UncertaintyBayesian linear regressionbusinesscomputerInternational Statistical Review
researchProduct

dglars: An R Package to Estimate Sparse Generalized Linear Models

2014

dglars is a publicly available R package that implements the method proposed in Augugliaro, Mineo, and Wit (2013), developed to study the sparse structure of a generalized linear model. This method, called dgLARS, is based on a differential geometrical extension of the least angle regression method proposed in Efron, Hastie, Johnstone, and Tibshirani (2004). The core of the dglars package consists of two algorithms implemented in Fortran 90 to efficiently compute the solution curve: a predictor-corrector algorithm, proposed in Augugliaro et al. (2013), and a cyclic coordinate descent algorithm, proposed in Augugliaro, Mineo, and Wit (2012). The latter algorithm, as shown here, is significan…

Statistics and ProbabilityGeneralized linear modelEXPRESSIONMathematical optimizationTISSUESFortrancyclic coordinate descent algorithmdgLARSFeature selectionDANTZIG SELECTORpredictor-corrector algorithmLIKELIHOODLEAST ANGLE REGRESSIONsparse modelsDifferential (infinitesimal)differential geometrylcsh:Statisticslcsh:HA1-4737computer.programming_languageMathematicsLeast-angle regressionExtension (predicate logic)Expression (computer science)generalized linear modelsBREAST-CANCER RISKVARIABLE SELECTIONDifferential geometrydifferential geometry generalized linear models dgLARS predictor-corrector algorithm cyclic coordinate descent algorithm sparse models variable selection.MARKERSHRINKAGEStatistics Probability and UncertaintyHAPLOTYPESSettore SECS-S/01 - StatisticacomputerAlgorithmSoftware
researchProduct

PROBABILISTIC QUANTIFICATION OF HAZARDS: A METHODOLOGY USING SMALL ENSEMBLES OF PHYSICS-BASED SIMULATIONS AND STATISTICAL SURROGATES

2015

This paper presents a novel approach to assessing the hazard threat to a locale due to a large volcanic avalanche. The methodology combines: (i) mathematical modeling of volcanic mass flows; (ii) field data of avalanche frequency, volume, and runout; (iii) large-scale numerical simulations of flow events; (iv) use of statistical methods to minimize computational costs, and to capture unlikely events; (v) calculation of the probability of a catastrophic flow event over the next T years at a location of interest; and (vi) innovative computational methodology to implement these methods. This unified presentation collects elements that have been separately developed, and incorporates new contri…

Statistics and ProbabilityHazard (logic)Volcanic hazardsgeographyControl and Optimizationgeography.geographical_feature_categoryProcess (engineering)Probabilistic logicHazard analysiscomputer.software_genreFlow (mathematics)VolcanoModeling and SimulationEconometricsDiscrete Mathematics and CombinatoricsEnvironmental scienceData miningcomputerEvent (probability theory)International Journal for Uncertainty Quantification
researchProduct

Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry.

2013

For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivat…

Statistics and ProbabilityMaleNiacinamideBoosting (machine learning)Carcinoma HepatocellularEpidemiologyComputer scienceScoreFeature selectionAntineoplastic Agentscomputer.software_genreDecision Support TechniquesNeoplasmsCovariateHumansRegistriesAgedProportional Hazards ModelsProportional hazards modelPhenylurea CompoundsLiver NeoplasmsRegression analysisConfounding Factors EpidemiologicMiddle AgedSorafenibPrognosisRegressionCancer registryData Interpretation StatisticalRegression AnalysisData miningcomputerStatistics in medicine
researchProduct

Criteria for Bayesian model choice with application to variable selection

2012

In objective Bayesian model selection, no single criterion has emerged as dominant in defining objective prior distributions. Indeed, many criteria have been separately proposed and utilized to propose differing prior choices. We first formalize the most general and compelling of the various criteria that have been suggested, together with a new criterion. We then illustrate the potential of these criteria in determining objective model selection priors by considering their application to the problem of variable selection in normal linear models. This results in a new model selection objective prior with a number of compelling properties.

Statistics and ProbabilityMathematical optimization62C10Model selectiong-priorLinear modelMathematics - Statistics TheoryFeature selectionStatistics Theory (math.ST)Model selectionBayesian inferenceObjective model62J05Prior probability62J15FOS: MathematicsStatistics Probability and Uncertaintyobjective BayesSelection (genetic algorithm)variable selectionMathematicsThe Annals of Statistics
researchProduct

On stability issues in deriving multivariable regression models

2014

In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to d…

Statistics and ProbabilityMultivariable calculusStability (learning theory)Context (language use)Regression analysisFeature selectionGeneral MedicineVariance (accounting)StatisticsCovariateEconometricsStatistics Probability and UncertaintySelection (genetic algorithm)MathematicsBiometrical Journal
researchProduct

A Hooke's law-based approach to protein folding rate

2014

Kinetics is a key aspect of the renowned protein folding problem. Here, we propose a comprehensive approach to folding kinetics where a polypeptide chain is assumed to behave as an elastic material described by the Hooke[U+05F3]s law. A novel parameter called elastic-folding constant results from our model and is suggested to distinguish between protein with two-state and multi-state folding pathways. A contact-free descriptor, named folding degree, is introduced as a suitable structural feature to study protein-folding kinetics. This approach generalizes the observed correlations between varieties of structural descriptors with the folding rate constant. Additionally several comparisons am…

Statistics and ProbabilityPROTDCALStructure analysisGeneral Biochemistry Genetics and Molecular BiologyArticleProtein Structure SecondaryAmino acid sequencesymbols.namesakeProtein structureEnergeticsFeature (machine learning)Statistical physicsProtein foldingTheoretical modelProtein secondary structureReaction kineticsGeneral Immunology and MicrobiologyChemical modelApplied MathematicsProteinHooke's lawModelingProteinsGeneral MedicineDNAComputer simulationElasticityFolding degreeFolding (chemistry)ChemistryKineticsModels ChemicalModeling and SimulationPeptidesymbolsProtein structureElastic folding constantPhysical chemistryProtein secondary structureThermodynamicsProtein foldingDownhill foldingPolypeptideGeneral Agricultural and Biological SciencesConstant (mathematics)Folding kinetics
researchProduct

Functional Principal Component Analysis for the explorative analysis of multisite-multivariate air pollution time series with long gaps

2013

The knowledge of the urban air quality represents the first step to face air pollution issues. For the last decades many cities can rely on a network of monitoring stations recording concentration values for the main pollutants. This paper focuses on functional principal component analysis (FPCA) to investigate multiple pollutant datasets measured over time at multiple sites within a given urban area. Our purpose is to extend what has been proposed in the literature to data that are multisite and multivariate at the same time. The approach results to be effective to highlight some relevant statistical features of the time series, giving the opportunity to identify significant pollutants and…

Statistics and ProbabilityPollutantFunctional principal component analysisgeographyMultivariate statisticsgeography.geographical_feature_categorySeries (mathematics)Computer scienceAir pollutionFunctional data analysiscomputer.software_genreUrban areamedicine.disease_causeAir quality Functional Data Analysis Three mode FPCA EOFmedicineData miningStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAir quality indexcomputer
researchProduct