Search results for "feature selection"

showing 10 items of 139 documents

Feature Selection Approach based on Mutual Information and Partial Least Squares

2014

Feature selection technology can improve the modeling accuracy and reduce model’s complexity, especially for the high dimensional spectral data. Aim at this problem, feature selection approach based on mutual information (MI) and partial least square (PLS) is proposed in this paper. MI values between features and responsible variable are calculated, and the threshold value using to select final features is optimal selected based on PLS algorithm. The numbers of the latent values of the PLS and the threshold value of MI are selected according the modeling performance simultaneously. The experimental results based on the near-infrared spectrum show that the proposed approach has better perfor…

Variable (computer science)Threshold limit valuebusiness.industryPartial least squares regressionGeneral EngineeringPattern recognitionFeature selectionHigh dimensionalArtificial intelligenceMutual informationSpectral databusinessMathematicsAdvanced Materials Research
researchProduct

Evaluation of the effect of chance correlations on variable selection using Partial Least Squares -Discriminant Analysis

2013

Variable subset selection is often mandatory in high throughput metabolomics and proteomics. However, depending on the variable to sample ratio there is a significant susceptibility of variable selection towards chance correlations. The evaluation of the predictive capabilities of PLSDA models estimated by cross-validation after feature selection provides overly optimistic results if the selection is performed on the entire set and no external validation set is available. In this work, a simulation of the statistical null hypothesis is proposed to test whether the discrimination capability of a PLSDA model after variable selection estimated by cross-validation is statistically higher than t…

Variable selectionESTADISTICA E INVESTIGACION OPERATIVAFeature selectionChance correlationsAnalytical ChemistrySet (abstract data type)ResamplingPartial least squares regressionStatisticsHumansMetabolomicsLeast-Squares AnalysisSelection (genetic algorithm)ProbabilityGaucher DiseaseModels StatisticalChemistryDiscriminant AnalysisReproducibility of ResultsPartial Least Squares-Discriminant Analysis (PLSDA)Linear discriminant analysisVariable (computer science)Null hypothesisAlgorithmsSoftware
researchProduct

Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability.

2016

Background Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability. Both representations, folded as well as unprocessed fingerprints, are often used for (Q)SAR modeling. Results We show that it can be prefer…

Virtual screeningFingerprintsFeature selectionResearch Article(Q)SARJournal of cheminformatics
researchProduct

Feature selection: A multi-objective stochastic optimization approach

2020

The feature subset task can be cast as a multiobjective discrete optimization problem. In this work, we study the search algorithm component of a feature subset selection method. We propose an algorithm based on the threshold accepting method, extended to the multi-objective framework by an appropriate definition of the acceptance rule. The method is used in the task of identifying relevant subsets of features in a Web bot recognition problem, where automated software agents on the Web are identified by analyzing the stream of HTTP requests to a Web server.

Web serverLinear programmingthreshold acceptingComputer scienceFeature extractionFeature selectionstochastic optimizationcomputer.software_genreMulti-objective optimizationfeature selection; multiobjective optimization; stochastic optimization; subset selection; threshold acceptingfeature selectionsubset selectionFeature (computer vision)Search algorithmStochastic optimizationmultiobjective optimizationData miningcomputer
researchProduct

Identifying legitimate Web users and bots with different traffic profiles — an Information Bottleneck approach

2020

Abstract Recent studies reported that about half of Web users nowadays are intelligent agents (Web bots). Many bots are impersonators operating at a very high sophistication level, trying to emulate navigational behaviors of legitimate users (humans). Moreover, bot technology continues to evolve which makes bot detection even harder. To deal with this problem, many advanced methods for differentiating bots from humans have been proposed, a large part of which relies on supervised machine learning techniques. In this paper, we propose a novel approach to identify various profiles of bots and humans which combines feature selection and unsupervised learning of HTTP-level traffic patterns to d…

Web userInformation Systems and ManagementComputer scienceInternet robotFeature selection02 engineering and technologyMachine learningcomputer.software_genreUnsupervised learningSession (web analytics)Management Information SystemsIntelligent agentArtificial Intelligence020204 information systemsMachine learning0202 electrical engineering electronic engineering information engineeringCluster analysisBot detectionbusiness.industryInformation bottleneck methodWeb botServer logHierarchical clusteringUnsupervised learning020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerSoftwareKnowledge-Based Systems
researchProduct

A novel pilot study of automatic identification of EMF radiation effect on brain using computer vision and machine learning

2020

Abstract Electromagnetic field (EMF) radiations from mobile phones and cell tower affect brain of humans and other organisms in many ways. Exposure to EMF could lead to neurological changes causing morphological or chemical changes in the brain and other internal organs. Cellular level analysis to measure and identify the effect of mobile radiations is an expensive and long process as it requires preparing the cell suspension for the analysis. This paper presents a novel pilot study to identify changes in brain morphology under EMF exposure considering drosophila melanogaster as a specimen. The brain is automatically segmented, obtaining microscopic images from which discriminatory geometri…

animal structuresComputer science0206 medical engineeringBiomedical EngineeringHealth InformaticsImage processingFeature selection02 engineering and technologyMachine learningcomputer.software_genre03 medical and health sciencesNaive Bayes classifier0302 clinical medicineComputer visionTime complexityArtificial neural networkbusiness.industryBrain morphometry020601 biomedical engineeringRandom forestSupport vector machineSignal ProcessingArtificial intelligencebusinesscomputer030217 neurology & neurosurgeryBiomedical Signal Processing and Control
researchProduct

Ensemble feature selection with the simple Bayesian classification

2003

Abstract A popular method for creating an accurate classifier from a set of training data is to build several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. One way to generate an ensemble of accurate and diverse simple Bayesian classifiers is to use different feature subsets generated with the random subspace method. In this case, the ensemble consists of multiple classifiers constructed by randomly selecting feature subsets, that is, classifiers constructed in randomly chosen subspaces. In this paper, we present an algorithm for building ensembles of simple Bayesian classifiers in random sub…

business.industryBayesian probabilityFeature selectionPattern recognitionMachine learningcomputer.software_genreLinear subspaceRandom subspace methodNaive Bayes classifierBayes' theoremComputingMethodologies_PATTERNRECOGNITIONHardware and ArchitectureSignal ProcessingArtificial intelligencebusinesscomputerClassifier (UML)SoftwareCascading classifiersInformation SystemsMathematicsInformation Fusion
researchProduct

Correlation-Based and Contextual Merit-Based Ensemble Feature Selection

2001

Recent research has proved the benefits of using an ensemble of diverse and accurate base classifiers for classification problems. In this paper the focus is on producing diverse ensembles with the aid of three feature selection heuristics based on two approaches: correlation and contextual merit -based ones. We have developed an algorithm and experimented with it to evaluate and compare the three feature selection heuristics on ten data sets from UCI Repository. On average, simple correlation-based ensemble has the superiority in accuracy. The contextual merit -based heuristics seem to include too many features in the initial ensembles and iterations were most successful with it.

business.industryComputer scienceFeature selectionMachine learningcomputer.software_genreBase (topology)CorrelationComputingMethodologies_PATTERNRECOGNITIONArtificial intelligenceHeuristicsbusinessFocus (optics)Simple correlationcomputer
researchProduct

Local dimensionality reduction within natural clusters for medical data analysis

2005

Inductive learning systems have been successfully applied in a number of medical domains. Nevertheless, the effective use of these systems requires data preprocessing before applying a learning algorithm. Especially it is important for multidimensional heterogeneous data, presented by a large number of features of different types. Dimensionality reduction is one commonly applied approach. The goal of this paper is to study the impact of natural clustering on dimensionality reduction for classification. We compare several data mining strategies that apply dimensionality reduction by means of feature extraction or feature selection for subsequent classification. We show experimentally on micr…

business.industryComputer scienceFeature vectorDimensionality reductionFeature extractionPattern recognitionFeature selectioncomputer.software_genreArtificial intelligenceData pre-processingData miningMultidimensional systemsbusinessCluster analysiscomputerCurse of dimensionality
researchProduct

Prediction Model Selection and Spare Parts Ordering Policy for Efficient Support of Maintenance and Repair of Equipment

2010

The prediction model selection problem via variable subset selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. Several papers have dealt with various aspects of the problem but it appears that the typical regression user has not benefited appreciably. One reason for the lack of resolution of the problem is the fact that it has not been well defined. Indeed, it is apparent that there is not a single probl…

business.industryComputer scienceModel selectionFeature selectionResolution (logic)Machine learningcomputer.software_genreVariable (computer science)Residual sum of squaresSpare partArtificial intelligencebusinesscomputerSelection (genetic algorithm)Parametric statistics
researchProduct