Search results for "dimensionality"
showing 10 items of 231 documents
A multiple-response chi-square framework for the analysis of Free-Comment and Check-All-That-Apply data
2021
International audience; Free-Comment (FC) and Check-All-That-Apply (CATA) provide a contingency table containing citation counts of descriptors by products. The analyses performed on this table are most often related to the chi-square statistic. However, such practices are not well suited because they consider experimental units as being the citations (one descriptor for one product by one subject) while the evaluations (vector of citations for one product by one subject) should be considered instead. This results in incorrect expected frequencies under the null hypothesis of independence between products and descriptors and thus in an incorrect chi-square statistic. Thus, analyses related …
Visible-NIR reflectance spectroscopy and manifold learning methods applied to the detection of fungal infections on citrus fruit
2015
Abstract The development of systems for automatically detecting decay in citrus fruit during quality control is still a challenge for the citrus industry. The feasibility of reflectance spectroscopy in the visible and near infrared (NIR) regions was evaluated for the automatic detection of the early symptoms of decay caused by Penicillium digitatum fungus in citrus fruit. Reflectance spectra of sound and decaying surface parts of mandarins cv. ‘Clemenvilla’ were acquired in two different spectral regions, from 650 nm to 1050 nm (visible–NIR) and from 1000 nm to 1700 nm (NIR), pointing to significant differences in spectra between sound and decaying skin for both spectral ranges. Three diffe…
Nonlinear data description with Principal Polynomial Analysis
2012
Principal Component Analysis (PCA) has been widely used for manifold description and dimensionality reduction. Performance of PCA is however hampered when data exhibits nonlinear feature relations. In this work, we propose a new framework for manifold learning based on the use of a sequence of Principal Polynomials that capture the eventually nonlinear nature of the data. The proposed Principal Polynomial Analysis (PPA) is shown to generalize PCA. Unlike recently proposed nonlinear methods (e.g. spectral/kernel methods and projection pursuit techniques, neural networks), PPA features are easily interpretable and the method leads to a fully invertible transform, which is a desirable property…
Modeling user preferences in content-based image retrieval: A novel attempt to bridge the semantic gap
2015
This paper is concerned with content-based image retrieval from a stochastic point of view. The semantic gap problem is addressed in two ways. First, a dimensional reduction is applied using the (pre-calculated) distances among images. The dimension of the reduced vector is the number of preferences that we allow the user to choose from, in this case, three levels. Second, the conditional probability distribution of the random user preference, given this reduced feature vector, is modeled using a proportional odds model. A new model is fitted at each iteration. The score used to rank the image database is based on the estimated probability function of the random preference. Additionally, so…
Local dimensionality reduction within natural clusters for medical data analysis
2005
Inductive learning systems have been successfully applied in a number of medical domains. Nevertheless, the effective use of these systems requires data preprocessing before applying a learning algorithm. Especially it is important for multidimensional heterogeneous data, presented by a large number of features of different types. Dimensionality reduction is one commonly applied approach. The goal of this paper is to study the impact of natural clustering on dimensionality reduction for classification. We compare several data mining strategies that apply dimensionality reduction by means of feature extraction or feature selection for subsequent classification. We show experimentally on micr…
A novel method for network intrusion detection based on nonlinear SNE and SVM
2017
In the case of network intrusion detection data, pre-processing techniques have been extensively used to enhance the accuracy of the model. An ideal intrusion detection system (IDS) is one that has appreciable detection capability overall the group of attacks. An open research problem of this area is the lower detection rate for less frequent attacks, which result from the curse of dimensionality and imbalanced class distribution of the benchmark datasets. This work attempts to minimise the effects of imbalanced class distribution by applying random under-sampling of the majority classes and SMOTE-based oversampling of minority classes. In order to alleviate the issue arising from the curse…
Semisupervised kernel orthonormalized partial least squares
2012
This paper presents a semisupervised kernel orthonormalized partial least squares (SS-KOPLS) algorithm for non-linear feature extraction. The proposed method finds projections that minimize the least squares regression error in Hilbert spaces and incorporates the wealth of unlabeled information to deal with small size labeled datasets. The method relies on combining a standard RBF kernel using labeled information, and a generative kernel learned by clustering all available data. The positive definiteness of the kernels is proven, and the structure and information content of the derived kernels is studied. The effectiveness of the proposed method is successfully illustrated in standard UCI d…
Solvatochromy and electro-optical study of new fluorine-containing chromophores
1998
technology requires tailored functional materials which fulfill the demands for optimal operation parameters, reliability andprocessability. Organic chromophores and polymers which contain covalently bound chromophores are promising materialclasses which can satisfy a broad spectrum of demands on functional materials for photonics, and they are, therefore,favorites for the development of new photonic devices. However, it is almost impossible to satisfy all physico-chemical andtechnological requirements simultaneously with a polymer consisting of only one type of functional unit. The developmentof a series of different building blocks which allow to cover the whole range of physico-chemical …
1,4-Bis(arylthio)but-2-enes as Assembling Ligands for (Cu2X2)n (X = I, Br; n = 1, 2) Coordination Polymers: Aryl Substitution, Olefin Configuration, …
2016
CuI reacts with E-PhS(CH2CH═CHCH2)SPh, L1, to afford the coordination polymer (CP) [Cu2I2{μ-E-PhS(CH2CH═CHCH2)SPh}2]n (1a). The unprecedented square-grid network of 1 is built upon alternating two-dimensional (2D) layers with an ABAB sequence and contains rhomboid Cu2(μ2-I)2 clusters as secondary building units (SBUs). Notably, layer A, interconnected by bridging L1 ligands, contains exclusively dinuclear units with short Cu···Cu separations [2.6485(7) A; 115 K]. In contrast, layer B exhibits Cu···Cu distances of 2.8133(8) A. The same network is observed when CuBr reacts with L1. In the 2D network of [Cu2Br2{μ-E-PhS(CH2CH═CHCH2)SPh}2]n (1b), isotype to 1a, one square-grid-type layer contain…
The Three Steps of Clustering in the Post-Genomic Era: A Synopsis
2011
Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particul…