Search results for "Mining"
showing 10 items of 1730 documents
Support vector machines in engineering: an overview
2014
This paper provides an overview of the support vector machine SVM methodology and its applicability to real-world engineering problems. Specifically, the aim of this study is to review the current state of the SVM technique, and to show some of its latest successful results in real-world problems present in different engineering fields. The paper starts by reviewing the main basic concepts of SVMs and kernel methods. Kernel theory, SVMs, support vector regression SVR, and SVM in signal processing and hybridization of SVMs with meta-heuristics are fully described in the first part of this paper. The adoption of SVMs in engineering is nowadays a fact. As we illustrate in this paper, SVMs can …
Kernel manifold alignment for domain adaptation
2016
The wealth of sensory data coming from different modalities has opened numerous opportu- nities for data analysis. The data are of increasing volume, complexity and dimensionality, thus calling for new methodological innovations towards multimodal data processing. How- ever, multimodal architectures must rely on models able to adapt to changes in the data dis- tribution. Differences in the density functions can be due to changes in acquisition conditions (pose, illumination), sensors characteristics (number of channels, resolution) or different views (e.g. street level vs. aerial views of a same building). We call these different acquisition modes domains, and refer to the adaptation proble…
Sorting of Single Biomolecules based on Fourier Polar Representation of Surface Enhanced Raman Spectra
2016
AbstractSurface enhanced Raman scattering (SERS) spectroscopy becomes increasingly used in biosensors for its capacity to detect and identify single molecules. In practice, a large number of SERS spectra are acquired and reliable ranking methods are thus essential for analysing all these data. Supervised classification strategies, which are the most effective methods, are usually applied but they require pre-determined models or classes. In this work, we propose to sort SERS spectra in unknown groups with an alternative strategy called Fourier polar representation. This non-fitting method based on simple Fourier sine and cosine transforms produces a fast and graphical representation for sor…
Ranking-Oriented Collaborative Filtering: A Listwise Approach
2016
Collaborative filtering (CF) is one of the most effective techniques in recommender systems, which can be either rating oriented or ranking oriented. Ranking-oriented CF algorithms demonstrated significant performance gains in terms of ranking accuracy, being able to estimate a precise preference ranking of items for each user rather than the absolute ratings (as rating-oriented CF algorithms do). Conventional memory-based ranking-oriented CF can be referred to as pairwise algorithms. They represent each user as a set of preferences on each pair of items for similarity calculations and predictions. In this study, we propose ListCF, a novel listwise CF paradigm that seeks improvement in bot…
The effect of automated taxa identification errors on biological indices
2017
In benthic macroinvertebrate biomonitoring systems, the target is to determine the status of ecosystems based on several biological indices. To increase cost-efficiency, computer-based taxa identification for image data has recently been developed. Taxa identification errors can, however, have strong effects on the indices and thus on the determination of the ecological status. In order to shift the biomonitoring process towards automated expert systems, we need a clear understanding on the bias caused by automation. In this paper, we examine eleven classification methods in the case of macroinvertebrate image data and show how their classification errors propagate into different biological…
SCCF Parameter and Similarity Measure Optimization and Evaluation
2019
Neighborhood-based Collaborative Filtering (CF) is one of the most successful and widely used recommendation approaches; however, it suffers from major flaws especially under sparse environments. Traditional similarity measures used by neighborhood-based CF to find similar users or items are not suitable in sparse datasets. Sparse Subspace Clustering and common liking rate in CF (SCCF), a recently published research, proposed a tunable similarity measure oriented towards sparse datasets; however, its performance can be maximized and requires further analysis and investigation. In this paper, we propose and evaluate the performance of a new tuning mechanism, using the Mean Absolute Error (MA…
Reestimating a minimum acceptable geocoding hit rate for conducting a spatial analysis
2019
Geocoding consists in converting a textual description of a location into coordinates. Hence, geocoding a dataset of events has to be carried out before performing a spatial analysis of some data. ...
Large-scale random features for kernel regression
2015
Kernel methods constitute a family of powerful machine learning algorithms, which have found wide use in remote sensing and geosciences. However, kernel methods are still not widely adopted because of the high computational cost when dealing with large scale problems, such as the inversion of radiative transfer models. This paper introduces the method of random kitchen sinks (RKS) for fast statistical retrieval of bio-geo-physical parameters. The RKS method allows to approximate a kernel matrix with a set of random bases sampled from the Fourier domain. We extend their use to other bases, such as wavelets, stumps, and Walsh expansions. We show that kernel regression is now possible for data…
Revisitation of Nonorthogonal Spin Adaptation in Coupled Cluster Theory.
2015
The benefits of what is alternatively called a nonorthogonally spin-adapted, spin-free, or orbital representation of the coupled cluster equations is discussed relative to orthogonally spin-adapted, spin-orbital, and spin-integrated theories. In particular, specific linear combinations of the orbital cluster amplitudes, denoted spin-summed amplitudes, are shown to reduce the number of contractions that must be explicitly performed and to simplify the expressions and their derivation. The computational efficiency of the spin-summed approach is discussed and compared to orthogonally spin-adapted and spin-integrated approaches. The spin-summed approach is shown to have significant computationa…
Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples.
2013
Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing…