Search results for "Dimensionality Reduction"
showing 10 items of 120 documents
Part-of-Speech Induction by Singular Value Decomposition and Hierarchical Clustering
2006
Part-of-speech induction involves the automatic discovery of word classes and the assignment of each word of a vocabulary to one or several of these classes. The approach proposed here is based on the analysis of word distributions in a large collection of German newspaper texts. Its main advantage over other attempts is that it combines the hierarchical clustering of context vectors with a previous step of dimensionality reduction that minimizes the effects of sampling errors.
Anomaly Detection from Network Logs Using Diffusion Maps
2011
The goal of this study is to detect anomalous queries from network logs using a dimensionality reduction framework. The fequencies of 2-grams in queries are extracted to a feature matrix. Dimensionality reduction is done by applying diffusion maps. The method is adaptive and thus does not need training before analysis. We tested the method with data that includes normal and intrusive traffic to a web server. This approach finds all intrusions in the dataset. peerReviewed
A Constrained Band Selection Method Based on Information Measures for Spectral Image Color Visualization
2011
International audience; We present a new method for the visualization of spectral images, based on a selection of three relevant spectral channels to build a Red-Green-Blue composite. Band selection is achieved by means of information measures at the first, second and third orders. Irrelevant channels are preliminarily removed by means of a center-surround entropy comparison. A visualization-oriented spectrum segmentation based on the use of color matching functions allows for computational ease and adjustment of the natural rendering. Results from the proposed method are presented and objectively compared to four other dimensionality reduction techniques in terms of naturalness and informa…
Spatially variant dimensionality reduction for the visualization of multi/hyperspectral images
2011
International audience; In this paper, we introduce a new approach for color visu- alization of multi/hyperspectral images. Unlike traditional methods, we propose to operate a local analysis instead of considering that all the pixels are part of the same population. It takes a segmentation map as an input and then achieves a dimensionality reduction adaptively inside each class of pixels. Moreover, in order to avoid unappealing discon- tinuities between regions, we propose to make use of a set of distance transform maps to weigh the mapping applied to each pixel with regard to its relative location with classes' centroids. Results on two hyperspec- tral datasets illustrate the efficiency of…
Saliency in spectral images
2011
International audience; Even though the study of saliency for color images has been thoroughly investigated in the past, very little attention has been given to datasets that cannot be displayed on traditional computer screens such as spectral images. Nevertheless, more than a means to predict human gaze, the study of saliency primarily allows for measuring infor- mative content. Thus, we propose a novel approach for the computation of saliency maps for spectral images. Based on the Itti model, it in- volves the extraction of both spatial and spectral features, suitable for high dimensionality images. As an application, we present a comparison framework to evaluate how dimensionality reduct…
Visible-NIR reflectance spectroscopy and manifold learning methods applied to the detection of fungal infections on citrus fruit
2015
Abstract The development of systems for automatically detecting decay in citrus fruit during quality control is still a challenge for the citrus industry. The feasibility of reflectance spectroscopy in the visible and near infrared (NIR) regions was evaluated for the automatic detection of the early symptoms of decay caused by Penicillium digitatum fungus in citrus fruit. Reflectance spectra of sound and decaying surface parts of mandarins cv. ‘Clemenvilla’ were acquired in two different spectral regions, from 650 nm to 1050 nm (visible–NIR) and from 1000 nm to 1700 nm (NIR), pointing to significant differences in spectra between sound and decaying skin for both spectral ranges. Three diffe…
Nonlinear data description with Principal Polynomial Analysis
2012
Principal Component Analysis (PCA) has been widely used for manifold description and dimensionality reduction. Performance of PCA is however hampered when data exhibits nonlinear feature relations. In this work, we propose a new framework for manifold learning based on the use of a sequence of Principal Polynomials that capture the eventually nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) is shown to generalize PCA. Unlike recently proposed nonlinear methods (e.g. spectral/kernel methods and projection pursuit techniques, neural networks), PPA features are easily interpretable and the method leads to a fully invertible transform, which is a desirable property…
Modeling user preferences in content-based image retrieval: A novel attempt to bridge the semantic gap
2015
This paper is concerned with content-based image retrieval from a stochastic point of view. The semantic gap problem is addressed in two ways. First, a dimensional reduction is applied using the (pre-calculated) distances among images. The dimension of the reduced vector is the number of preferences that we allow the user to choose from, in this case, three levels. Second, the conditional probability distribution of the random user preference, given this reduced feature vector, is modeled using a proportional odds model. A new model is fitted at each iteration. The score used to rank the image database is based on the estimated probability function of the random preference. Additionally, so…
Local dimensionality reduction within natural clusters for medical data analysis
2005
Inductive learning systems have been successfully applied in a number of medical domains. Nevertheless, the effective use of these systems requires data preprocessing before applying a learning algorithm. Especially it is important for multidimensional heterogeneous data, presented by a large number of features of different types. Dimensionality reduction is one commonly applied approach. The goal of this paper is to study the impact of natural clustering on dimensionality reduction for classification. We compare several data mining strategies that apply dimensionality reduction by means of feature extraction or feature selection for subsequent classification. We show experimentally on micr…
A novel method for network intrusion detection based on nonlinear SNE and SVM
2017
In the case of network intrusion detection data, pre-processing techniques have been extensively used to enhance the accuracy of the model. An ideal intrusion detection system (IDS) is one that has appreciable detection capability overall the group of attacks. An open research problem of this area is the lower detection rate for less frequent attacks, which result from the curse of dimensionality and imbalanced class distribution of the benchmark datasets. This work attempts to minimise the effects of imbalanced class distribution by applying random under-sampling of the majority classes and SMOTE-based oversampling of minority classes. In order to alleviate the issue arising from the curse…