Search results for "dimensionality"
showing 10 items of 231 documents
The impact of feature extraction on the performance of a classifier : kNN, Naïve Bayes and C4.5
2005
"The curse of dimensionality" is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and the classification error in high dimensions. In this paper, different feature extraction techniques as means of (1) dimensionality reduction, and (2) constructive induction are analyzed with respect to the performance of a classifier. Three commonly used classifiers are taken for the analysis: kNN, Naïve Bayes and C4.5 decision tree. One of the main goals of this paper is to show the importance of the use of class information in feature extraction for classification and (in)appropriateness of random projection or conventional PCA to feature extraction for …
Parameter Rating by Diffusion Gradient
2014
Anomaly detection is a central task in high-dimensional data analysis. It can be performed by using dimensionality reduction methods to obtain a low-dimensional representation of the data, which reveals the geometry and the patterns that exist and govern it. Usually, anomaly detection methods classify high-dimensional vectors that represent data points as either normal or abnormal. Revealing the parameters (i.e., features) that cause detected abnormal behaviors is critical in many applications. However, this problem is not addressed by recent anomaly-detection methods and, specifically, by nonparametric methods, which are based on feature-free analysis of the data. In this chapter, we provi…
Dimensionality Reduction Techniques: An Operational Comparison On Multispectral Satellite Images Using Unsupervised Clustering
2006
Multispectral satellite imagery provides us with useful but redundant datasets. Using Dimensionality Reduction (DR) algorithms, these datasets can be made easier to explore and to use. We present in this study an objective comparison of five DR methods, by evaluating their capacity to provide a usable input to the K-means clustering algorithm. We also suggest a method to automatically find a suitable number of classes K, using objective "cluster validity indexes" over a range of values for K. Ten Landsat images have been processed, yielding a classification rate in the 70-80% range. Our results also show that classical linear methods, though slightly outperformed by more recent nonlinear al…
Polar Classification of Nominal Data
2013
Many modern systems record various types of parameter values. Numerical values are relatively convenient for data analysis tools because there are many methods to measure distances and similarities between them. The application of dimensionality reduction techniques for data sets with such values is also a well known practice. Nominal (i.e., categorical) values, on the other hand, encompass some problems for current methods. Most of all, there is no meaningful distance between possible nominal values, which are either equal or unequal to each other. Since many dimensionality reduction methods rely on preserving some form of similarity or distance measure, their application to such data sets…
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
2016
The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the est…
Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis
2006
Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…
Feature extraction for classification in knowledge discovery systems
2003
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of "the curse of dimensionality". We consider three different eigenvector-based feature extraction approaches for classification. The summary of obtained results concerning the accuracy of classification schemes is presented and the issue of search for the most appropriate feature extraction method for a given data set is considered. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the d…
More on the Dimensionality of the GHQ-12: Competitive Confirmatory Models
2019
The General Health Questionnaire (GHQ) was designed to measure minor psychiatric morbidity by assessing normal ‘healthy’ functioning and the appearance of new, distressing symptoms. Among its versions, the 12-item is one of the most used. GHQ-12’s validity and reliability have been extensively tested in samples from different populations. In the Spanish version, studies have come to different conclusions, of one, two, and three-factor structures. This research aims to present additional evidence on the factorial validity of the Spanish version of the GHQ-12, using competitive confirmatory models. Three samples of workers (N= 525, 414 and 540) were used to test a set of substantive models pr…
Nonlinear PCA for Spatio-Temporal Analysis of Earth Observation Data
2020
Remote sensing observations, products, and simulations are fundamental sources of information to monitor our planet and its climate variability. Uncovering the main modes of spatial and temporal variability in Earth data is essential to analyze and understand the underlying physical dynamics and processes driving the Earth System. Dimensionality reduction methods can work with spatio-temporal data sets and decompose the information efficiently. Principal component analysis (PCA), also known as empirical orthogonal functions (EOFs) in geophysics, has been traditionally used to analyze climatic data. However, when nonlinear feature relations are present, PCA/EOF fails. In this article, we pro…
Special functions for the study of economic dynamics: The case of the Lucas-Uzawa model
2008
The special functions are intensively used in mathematical physics to solve differential systems. We argue that they should be most useful in economic dynamics, notably in the assessment of the transition dynamics of endogenous economic growth models. We illustrate our argument on the famous Lucas-Uzawa model, which we solve by the means of Gaussian hypergeometric functions. We show how the use of Gaussian hypergeometric functions allows for an explicit representation of the equilibrium dynamics of all variables in level. The parameters of the involved hypergeometric functions are identified using the Pontryagin conditions arising from the underlying optimization problems. In contrast to th…