Search results for "Training set"
showing 10 items of 68 documents
On the Convergence of Tsetlin Machines for the IDENTITY- and NOT Operators
2020
The Tsetlin Machine (TM) is a recent machine learning algorithm with several distinct properties, such as interpretability, simplicity, and hardware-friendliness. Although numerous empirical evaluations report on its performance, the mathematical analysis of its convergence is still open. In this article, we analyze the convergence of the TM with only one clause involved for classification. More specifically, we examine two basic logical operators, namely, the "IDENTITY"- and "NOT" operators. Our analysis reveals that the TM, with just one clause, can converge correctly to the intended logical operator, learning from training data over an infinite time horizon. Besides, it can capture arbit…
A survey of active learning algorithms for supervised remote sensing image classification
2011
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active …
Development of Methods for the Classification of Vegetable Oils According to Their Botanical Origin
2012
The aim of this work was to construct an LDA model able to classify vegetable oils according to their botanical origin using FTIR spectroscopy data. Also, FTIR data treatment by MLR was used to detect and quantify EVOO adulteration with other low cost edible oils. For these purposes, the vegetable oils shown in Table 5.1 were used. The FTIR spectra of these 30 oil samples were then measured. In all cases, at least two spectra were recorded for each sample. As indicated in this table, four samples of each botanical origin were used to construct a training set in the classification studies, while the remaining samples of each category were employed to evaluate the prediction capability of the…
Semisupervised nonlinear feature extraction for image classification
2012
Feature extraction is of paramount importance for an accurate classification of remote sensing images. Techniques based on data transformations are widely used in this context. However, linear feature extraction algorithms, such as the principal component analysis and partial least squares, can address this problem in a suboptimal way because the data relations are often nonlinear. Kernel methods may alleviate this problem only when the structure of the data manifold is properly captured. However, this is difficult to achieve when small-size training sets are available. In these cases, exploiting the information contained in unlabeled samples together with the available training data can si…
Model selection based product kernel learning for regression on graphs
2013
The choice of a suitable graph kernel is intrinsically hard and often cannot be made in an informed manner for a given dataset. Methods for multiple kernel learning offer a possible remedy, as they combine and weight kernels on the basis of a labeled training set of molecules to define a new kernel. Whereas most methods for multiple kernel learning focus on learning convex linear combinations of kernels, we propose to combine kernels in products, which theoretically enables higher expressiveness. In experiments on ten publicly available chemical QSAR datasets we show that product kernel learning is on no dataset significantly worse than any of the competing kernel methods and on average the…
Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosis
2011
In the last decade, machine learning (ML) techniques have been used for developing classifiers for automatic brain tumour diagnosis. However, the development of these ML models rely on a unique training set and learning stops once this set has been processed. Training these classifiers requires a representative amount of data, but the gathering, preprocess, and validation of samples is expensive and time-consuming. Therefore, for a classical, non-incremental approach to ML, it is necessary to wait long enough to collect all the required data. In contrast, an incremental learning approach may allow us to build an initial classifier with a smaller number of samples and update it incrementally…
Domain separation for efficient adaptive active learning
2011
This paper proposes a procedure aimed at efficiently adapting a classifier trained on a source image to a similar target image. The adaptation is carried out through active queries in the target domain following a strategy particularly designed for the case where class distributions have shifted between the two images. We first suggest a pre-selection of candidate pixels issued from the target image by keeping only those samples appearing to be lying in a region of the input space not yet covered by the existing ground truth (source domain pixels). Then, exploiting a classifier integrating instance weights, active queries are performed on the target image. As the inclusion to the training s…
The role of perceptual contrast non-linearities in image transform quantization
2000
Abstract The conventional quantizer design based on average error minimization over a training set does not guarantee a good subjective behavior on individual images even if perceptual metrics are used. In this work a novel criterion for transform coder design is analyzed in depth. Its aim is to bound the perceptual distortion in each individual quantization according to a non-linear model of early human vision. A common comparison framework is presented to describe the qualitative behavior of the optimal quantizers under the proposed criterion and the conventional rate-distortion based criterion. Several underlying metrics, with and without perceptual non-linearities, are used with both cr…
Reducing the Human Effort in Text Line Segmentation for Historical Documents
2021
Labeling the layout in historical documents for preparing training data for machine learning techniques is an arduous task that requires great human effort. A draft of the layout can be obtained by using a document layout analysis (DLA) system that later can be corrected by the user with less effort than doing it from scratch. We research in this paper an iterative process in which the user only supervises and corrects the given draft for the pages automatically selected by the DLA system with the aim of reducing the required human effort. The results obtained show that similar DLA quality can be achieved by reducing the number of pages that the user has to annote and that the accumulated h…
Semi-Supervised Classification Method for Hyperspectral Remote Sensing Images
2004
A new approach to the classification of hyperspectral images is proposed. The main problem with supervised methods is that the learning process heavily depends on the quality of the training data set. In remote sensing, the training set is useful only for simultaneous images or for images with the same classes taken under the same conditions; and, even worse, the training set is frequently not available. On the other hand, unsupervised methods are not sensitive to the number of labelled samples since they work on the whole image. Nevertheless, relationship between clusters and classes is not ensured. In this context, we propose a combined strategy of supervised and unsupervised learning met…