Search results for "Feature selection"

showing 10 items of 139 documents

Criteria for Bayesian model choice with application to variable selection

2012

In objective Bayesian model selection, no single criterion has emerged as dominant in defining objective prior distributions. Indeed, many criteria have been separately proposed and utilized to propose differing prior choices. We first formalize the most general and compelling of the various criteria that have been suggested, together with a new criterion. We then illustrate the potential of these criteria in determining objective model selection priors by considering their application to the problem of variable selection in normal linear models. This results in a new model selection objective prior with a number of compelling properties.

Statistics and ProbabilityMathematical optimization62C10Model selectiong-priorLinear modelMathematics - Statistics TheoryFeature selectionStatistics Theory (math.ST)Model selectionBayesian inferenceObjective model62J05Prior probability62J15FOS: MathematicsStatistics Probability and Uncertaintyobjective BayesSelection (genetic algorithm)variable selectionMathematicsThe Annals of Statistics
researchProduct

On stability issues in deriving multivariable regression models

2014

In many areas of science where empirical data are analyzed, a task is often to identify important variables with influence on an outcome. Most often this is done by using a variable selection strategy in the context of a multivariable regression model. Using a study on ozone effects in children (n = 496, 24 covariates), we will discuss aspects relevant for deriving a suitable model. With an emphasis on model stability, we will explore and illustrate differences between predictive models and explanatory models, the key role of stopping criteria, and the value of bootstrap resampling (with and without replacement). Bootstrap resampling will be used to assess variable selection stability, to d…

Statistics and ProbabilityMultivariable calculusStability (learning theory)Context (language use)Regression analysisFeature selectionGeneral MedicineVariance (accounting)StatisticsCovariateEconometricsStatistics Probability and UncertaintySelection (genetic algorithm)MathematicsBiometrical Journal
researchProduct

The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression.

2020

This paper focuses on hypothesis testing in lasso regression, when one is interested in judging statistical significance for the regression coefficients in the regression equation involving a lot of covariates. To get reliable p-values, we propose a new lasso-type estimator relying on the idea of induced smoothing which allows to obtain appropriate covariance matrix and Wald statistic relatively easily. Some simulation experiments reveal that our approach exhibits good performance when contrasted with the recent inferential tools in the lasso framework. Two real data analyses are presented to illustrate the proposed framework in practice.

Statistics and ProbabilityStatistics::TheoryInduced smoothingEpidemiologyComputer scienceFeature selectionWald test01 natural sciencesasthma researchStatistics::Machine Learning010104 statistics & probability03 medical and health sciencesHealth Information ManagementLasso (statistics)Linear regressionsparse modelsStatistics::MethodologyComputer Simulation0101 mathematicssandwich formula030304 developmental biologyStatistical hypothesis testing0303 health sciencesCovariance matrixlung functionRegression analysisStatistics::Computationsparse modelResearch DesignAlgorithmSmoothingvariable selectionStatistical methods in medical research
researchProduct

Structure Learning in Nested Effects Models

2007

Nested Effects Models (NEMs) are a class of graphical models introduced to analyze the results of gene perturbation screens. NEMs explore noisy subset relations between the high-dimensional outputs of phenotyping studies, e.g., the effects showing in gene expression profiles or as morphological features of the perturbed cell. In this paper we expand the statistical basis of NEMs in four directions. First, we derive a new formula for the likelihood function of a NEM, which generalizes previous results for binary data. Second, we prove model identifiability under mild assumptions. Third, we show that the new formulation of the likelihood allows efficiency in traversing model space. Fourth, we…

Statistics and ProbabilityTraverseComputer scienceMolecular Networks (q-bio.MN)Genes MHC Class IIPerturbation (astronomy)Genes InsectFeature selectionQuantitative Biology - Quantitative Methods03 medical and health sciences0302 clinical medicineGeneticsAnimalsheterocyclic compoundsQuantitative Biology - Molecular NetworksGraphical modelMolecular BiologyQuantitative Methods (q-bio.QM)Oligonucleotide Array Sequence Analysis030304 developmental biologyLikelihood Functions0303 health sciencesNanoelectromechanical systemsModels StatisticalModels GeneticGene Expression ProfilingGenomicsComputational MathematicsDrosophila melanogasterPhenotypeFOS: Biological sciencesBinary dataIdentifiabilityRNA InterferenceLikelihood functionAlgorithmAlgorithms030217 neurology & neurosurgery
researchProduct

Urban monitoring using multi-temporal SAR and multi-spectral data

2006

In some key operational domains, the joint use of synthetic aperture radar (SAR) and multi-spectral sensors has shown to be a powerful tool for Earth observation. In this paper, we analyze the potentialities of combining interferometric SAR and multi-spectral data for urban area characterization and monitoring. This study is carried out following a standard multi-source processing chain. First, a pre-processing stage is performed taking into account the underlying physics, geometry, and statistical models for the data from each sensor. Second, two different methodologies, one for supervised and another for unsupervised approaches, are followed to obtain features that optimize the urban rela…

Synthetic aperture radarEarth observationFeature selectionStatistical modelcomputer.software_genreData setData acquisitionArtificial IntelligenceSignal ProcessingStandard algorithmsComputer Vision and Pattern RecognitionData miningcomputerSoftwareMulti-sourcePattern Recognition Letters
researchProduct

Feature Selection for Ensembles of Simple Bayesian Classifiers

2002

A popular method for creating an accurate classifier from a set of training data is to train several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. However, the simple Bayesian classifier has much broader applicability than previously thought. Besides its high classification accuracy, it also has advantages in terms of simplicity, learning speed, classification speed, storage space, and incrementality. One way to generate an ensemble of simple Bayesian classifiers is to use different feature subsets as in the random subspace method. In this paper we present a technique for building ensembles o…

Training setComputer sciencebusiness.industryBayesian probabilityPattern recognitionFeature selectionMachine learningcomputer.software_genreLinear subspaceRandom subspace methodNaive Bayes classifierComputingMethodologies_PATTERNRECOGNITIONIterative refinementArtificial intelligencebusinesscomputerClassifier (UML)Cascading classifiers
researchProduct

Ensemble Feature Selection Based on the Contextual Merit

2001

Recent research has proved the benefits of using ensembles of classifiers for classification problems. Ensembles constructed by machine learning methods manipulating the training set are used to create diverse sets of accurate classifiers. Different feature selection techniques based on applying different heuristics for generating base classifiers can be adjusted to specific domain characteristics. In this paper we consider and experiment with the contextual feature merit measure as a feature selection heuristic. We use the diversity of an ensemble as evaluation function in our new algorithm with a refinement cycle. We have evaluated our algorithm on seven data sets from UCI. The experiment…

Training setComputer sciencebusiness.industryHeuristicPattern recognitionFeature selectionContext (language use)Machine learningcomputer.software_genreEvaluation functionComputingMethodologies_PATTERNRECOGNITIONEnsembles of classifiersFeature (computer vision)Artificial intelligenceHeuristicsbusinesscomputer
researchProduct

2004

This paper presents the use of Support Vector Machines (SVMs) for prediction and analysis of antisense oligonucleotide (AO) efficacy. The collected database comprises 315 AO molecules including 68 features each, inducing a problem well-suited to SVMs. The task of feature selection is crucial given the presence of noisy or redundant features, and the well-known problem of the curse of dimensionality. We propose a two-stage strategy to develop an optimal model: (1) feature selection using correlation analysis, mutual information, and SVM-based recursive feature elimination (SVM-RFE), and (2) AO prediction using standard and profiled SVM formulations. A profiled SVM gives different weights to …

Training setCorrelation coefficientMean squared errorComputer sciencebusiness.industryApplied MathematicsFeature selectionMutual informationMachine learningcomputer.software_genreBiochemistryComputer Science ApplicationsSupport vector machineStructural BiologyFeature (machine learning)Artificial intelligencebusinessMolecular BiologycomputerEnergy (signal processing)Curse of dimensionalityBMC Bioinformatics
researchProduct

Ensemble Feature Selection Based on Contextual Merit and Correlation Heuristics

2001

Recent research has proven the benefits of using ensembles of classifiers for classification problems. Ensembles of diverse and accurate base classifiers are constructed by machine learning methods manipulating the training sets. One way to manipulate the training set is to use feature selection heuristics generating the base classifiers. In this paper we examine two of them: correlation-based and contextual merit -based heuristics. Both rely on quite similar assumptions concerning heterogeneous classification problems. Experiments are considered on several data sets from UCI Repository. We construct fixed number of base classifiers over selected feature subsets and refine the ensemble iter…

Training setbusiness.industryComputer scienceFeature selectionPattern recognitionBase (topology)Machine learningcomputer.software_genreExpert systemRandom subspace methodComputingMethodologies_PATTERNRECOGNITIONEnsembles of classifiersFeature (machine learning)Artificial intelligencebusinessHeuristicscomputerCascading classifiers
researchProduct

Design and Prototyping of a Smart University Campus

2019

The authors propose a framework to support the “smart planning” of a university environment, intended as a “smart campus.” The main goal is to improve the management, storage, and mining of information coming from the university areas and main players. The platform allows for interaction with the main players of the system, generating and displaying useful data in real time for a better user experience. The proposed framework provides also a chat assistant able to respond to user requests in real time. This will not only improve the communication between university environment and students, but it allows one to investigate on their habits and needs. Moreover, information collected from the …

University campusEngineering managementSettore INF/01 - InformaticaComputer sciencesmart campus feature selection
researchProduct