Search results for "dimensionality"
showing 10 items of 231 documents
Neural Network Approach for Characterizing Structural Transformations by X-Ray Absorption Fine Structure Spectroscopy
2018
AIF acknowledge support by the US Department of Energy, Office of Basic Energy Sciences under Grant No. DE-FG02 03ER15476. AIF acknowledges support by the Laboratory Directed Research and Development Program through LDRD 18-047 of Brookhaven National Laboratory under U.S. Department of Energy Contract No. DE-SC0012704 for initiating his research in machine learning methods. The help of the beamline staff at ELETTRA (project 20160412) synchrotron radiation facility is acknowledged. RMC-EXAFS and MD-EXAFS simulations were performed on the LASC cluster-type computer at Institute of Solid State Physics of the University of Latvia.
Detection of steering direction using EEG recordings based on sample entropy and time-frequency analysis.
2016
Monitoring driver's intentions beforehand is an ambitious aim, which will bring a huge impact on the society by preventing traffic accidents. Hence, in this preliminary study we recorded high resolution electroencephalography (EEG) from 5 subjects while driving a car under real conditions along with an accelerometer which detects the onset of steering. Two sensor-level analyses, sample entropy and time-frequency analysis, have been implemented to observe the dynamics before the onset of steering. Thus, in order to classify the steering direction we applied a machine learning algorithm consisting of: dimensionality reduction and classification using principal-component-analysis (PCA) and sup…
Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors.
2013
Sparse Manifold Clustering and Embedding (SMCE) algorithm has been recently proposed for simultaneous clustering and dimensionality reduction of data on nonlinear manifolds using sparse representation techniques. In this work, SMCE algorithm is applied to the differential discrimination of Glioblastoma and Meningioma Tumors by means of their Gene Expression Profiles. Our purpose was to evaluate the robustness of this nonlinear manifold to classify gene expression profiles, characterized by the high-dimensionality of their representations and the low discrimination power of most of the genes. For this objective, we used SMCE to reduce the dimensionality of a preprocessed dataset of 35 single…
Improving clustering of Web bot and human sessions by applying Principal Component Analysis
2019
View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …
An efficient functional magnetic resonance imaging data reduction strategy using neighborhood preserving embedding algorithm
2021
High dimensionality data have become common in neuroimaging fields, especially group-level functional magnetic resonance imaging (fMRI) datasets. fMRI connectivity analysis is a widely used, powerful technique for studying functional brain networks to probe underlying mechanisms of brain function and neuropsychological disorders. However, data-driven technique like independent components analysis (ICA), can yield unstable and inconsistent results, confounding the true effects of interest and hindering the understanding of brain functionality and connectivity. A key contributing factor to this instability is the information loss that occurs during fMRI data reduction. Data reduction of high …
Variability of Classification Results in Data with High Dimensionality and Small Sample Size
2021
The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To ev…
A local complexity based combination method for decision forests trained with high-dimensional data
2012
Accurate machine learning with high-dimensional data is affected by phenomena known as the “curse” of dimensionality. One of the main strategies explored in the last decade to deal with this problem is the use of multi-classifier systems. Several of such approaches are inspired by the Random Subspace Method for the construction of decision forests. Furthermore, other studies rely on estimations of the individual classifiers' competence, to enhance the combination in the multi-classifier and improve the accuracy. We propose a competence estimate which is based on local complexity measurements, to perform a weighted average combination of the decision forest. Experimental results show how thi…
Making nonlinear manifold learning models interpretable: The manifold grand tour
2015
Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to…
Dimensionality reduction via regression on hyperspectral infrared sounding data
2014
This paper introduces a new method for dimensionality reduction via regression (DRR). The method generalizes Principal Component Analysis (PCA) in such a way that reduces the variance of the PCA scores. In order to do so, DRR relies on a deflationary process in which a non-linear regression reduces the redundancy between the PC scores. Unlike other nonlinear dimensionality reduction methods, DRR is easy to apply, it has out-of-sample extension, it is invertible, and the learned transformation is volume-preserving. These properties make the method useful for a wide range of applications, especially in very high dimensional data in general, and for hyperspectral image processing in particular…
A Feature Set Decomposition Method for the Construction of Multi-classifier Systems Trained with High-Dimensional Data
2013
Data mining for the discovery of novel, useful patterns, encounters obstacles when dealing with high-dimensional datasets, which have been documented as the "curse" of dimensionality. A strategy to deal with this issue is the decomposition of the input feature set to build a multi-classifier system. Standalone decomposition methods are rare and generally based on random selection. We propose a decomposition method which uses information theory tools to arrange input features into uncorrelated and relevant subsets. Experimental results show how this approach significantly outperforms three baseline decomposition methods, in terms of classification accuracy.