Search results for "Curse"
showing 10 items of 115 documents
The impact of feature extraction on the performance of a classifier : kNN, Naïve Bayes and C4.5
2005
"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and the classification error in high dimensions. In this paper, different feature extraction techniques as means of (1) dimensionality reduction, and (2) constructive induction are analyzed with respect to the performance of a classifier. Three commonly used classifiers are taken for the analysis: kNN, Naïve Bayes and C4.5 decision tree. One of the main goals of this paper is to show the importance of the use of class information in feature extraction for classification and (in)appropriateness of random projection or conventional PCA to feature extraction for …
2020
Abstract Information technology (IT) engagement is defined as a need to spend more time using IT. Practice-based examples show that IT engagement can have adverse effects in organizations. Although users can potentially get more work done through IT engagement, observations show that the users might jeopardize their well-being and hamper their work performance. We aimed to investigate this complexity in the research on IT engagement by examining its potential antecedents and outcomes in organizations. Considering the potentially mixed outcomes, we developed a model to examine the effects of IT engagement on personal productivity and strain. We also aimed to explain the antecedents of IT eng…
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
2016
The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the est…
Feature extraction for classification in knowledge discovery systems
2003
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". We consider three different eigenvector-based feature extraction approaches for classification. The summary of obtained results concerning the accuracy of classification schemes is presented and the issue of search for the most appropriate feature extraction method for a given data set is considered. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the d…
More on the Dimensionality of the GHQ-12: Competitive Confirmatory Models
2019
The General Health Questionnaire (GHQ) was designed to measure minor psychiatric morbidity by assessing normal ‘healthy’ functioning and the appearance of new, distressing symptoms. Among its versions, the 12-item is one of the most used. GHQ-12’s validity and reliability have been extensively tested in samples from different populations. In the Spanish version, studies have come to different conclusions, of one, two, and three-factor structures. This research aims to present additional evidence on the factorial validity of the Spanish version of the GHQ-12, using competitive confirmatory models. Three samples of workers (N= 525, 414 and 540) were used to test a set of substantive models pr…
Forward-backward equations for nonlinear propagation in axially invariant optical systems
2004
We present a novel general framework to deal with forward and backward components of the electromagnetic field in axially-invariant nonlinear optical systems, which include those having any type of linear or nonlinear transverse inhomogeneities. With a minimum amount of approximations, we obtain a system of two first-order equations for forward and backward components explicitly showing the nonlinear couplings among them. The modal approach used allows for an effective reduction of the dimensionality of the original problem from 3+1 (three spatial dimensions plus one time dimension) to 1+1 (one spatial dimension plus one frequency dimension). The new equations can be written in a spinor Dir…
Transfer Learning with Convolutional Networks for Atmospheric Parameter Retrieval
2018
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP). Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models. Statistical models performance is compromised because of the extremely high spectral dimensionality and the high number of variables to be predicted simultaneously across the atmospheric column. All this poses a challenge for selecting and studying optimal models and processing schemes. Earlier work has shown non-linear models such as kernel methods and neural networks perform w…
Unsupervised Anomaly and Change Detection With Multivariate Gaussianization
2022
Anomaly detection (AD) is a field of intense research in remote sensing (RS) image processing. Identifying low probability events in RS images is a challenging problem given the high dimensionality of the data, especially when no (or little) information about the anomaly is available a priori. While a plenty of methods are available, the vast majority of them do not scale well to large datasets and require the choice of some (very often critical) hyperparameters. Therefore, unsupervised and computationally efficient detection methods become strictly necessary, especially now with the data deluge problem. In this article, we propose an unsupervised method for detecting anomalies and changes …
Gaussianizing the Earth: Multidimensional Information Measures for Earth Data Analysis
2021
Information theory is an excellent framework for analyzing Earth system data because it allows us to characterize uncertainty and redundancy, and is universally interpretable. However, accurately estimating information content is challenging because spatio-temporal data is high-dimensional, heterogeneous and has non-linear characteristics. In this paper, we apply multivariate Gaussianization for probability density estimation which is robust to dimensionality, comes with statistical guarantees, and is easy to apply. In addition, this methodology allows us to estimate information-theoretic measures to characterize multivariate densities: information, entropy, total correlation, and mutual in…
PRINCIPAL POLYNOMIAL ANALYSIS
2014
© 2014 World Scientific Publishing Company. This paper presents a new framework for manifold learning based on a sequence of principal polynomials that capture the possibly nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) generalizes PCA by modeling the directions of maximal variance by means of curves instead of straight lines. Contrarily to previous approaches PPA reduces to performing simple univariate regressions which makes it computationally feasible and robust. Moreover PPA shows a number of interesting analytical properties. First PPA is a volume preserving map which in turn guarantees the existence of the inverse. Second such an inverse can be obtained…