Search results for "METHODOLOGIE"
showing 10 items of 2141 documents
Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
2017
Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while …
A Survey of Multi-Label Topic Models
2019
Every day, an enormous amount of text data is produced. Sources of text data include news, social media, emails, text messages, medical reports, scientific publications and fiction. To keep track of this data, there are categories, key words, tags or labels that are assigned to each text. Automatically predicting such labels is the task of multi-label text classification. Often however, we are interested in more than just the pure classification: rather, we would like to understand which parts of a text belong to the label, which words are important for the label or which labels occur together. Because of this, topic models may be used for multi-label classification as an interpretable mode…
Computer-Aided Diagnosis System with Backpropagation Artificial Neural Network—Improving Human Readers Performance
2016
This article presents the results of a study into possibility of artificial neural networks (ANNs) to classify cancer changes in mammographic images. Today’s Computer-Aided Detection (CAD) systems cannot detect 100 % of pathological changes. One of the properties of an ANN is generalized information —it can identify not only learned data but also data that is similar to training set. The combination of CAD and ANN could give better result and help radiologists to take the right decision.
Feature Selection for Ensembles of Simple Bayesian Classifiers
2002
A popular method for creating an accurate classifier from a set of training data is to train several classifiers, and then to combine their predictions. The ensembles of simple Bayesian classifiers have traditionally not been a focus of research. However, the simple Bayesian classifier has much broader applicability than previously thought. Besides its high classification accuracy, it also has advantages in terms of simplicity, learning speed, classification speed, storage space, and incrementality. One way to generate an ensemble of simple Bayesian classifiers is to use different feature subsets as in the random subspace method. In this paper we present a technique for building ensembles o…
Ensemble Feature Selection Based on the Contextual Merit
2001
Recent research has proved the benefits of using ensembles of classifiers for classification problems. Ensembles constructed by machine learning methods manipulating the training set are used to create diverse sets of accurate classifiers. Different feature selection techniques based on applying different heuristics for generating base classifiers can be adjusted to specific domain characteristics. In this paper we consider and experiment with the contextual feature merit measure as a feature selection heuristic. We use the diversity of an ensemble as evaluation function in our new algorithm with a refinement cycle. We have evaluated our algorithm on seven data sets from UCI. The experiment…
Ensemble Feature Selection Based on Contextual Merit and Correlation Heuristics
2001
Recent research has proven the benefits of using ensembles of classifiers for classification problems. Ensembles of diverse and accurate base classifiers are constructed by machine learning methods manipulating the training sets. One way to manipulate the training set is to use feature selection heuristics generating the base classifiers. In this paper we examine two of them: correlation-based and contextual merit -based heuristics. Both rely on quite similar assumptions concerning heterogeneous classification problems. Experiments are considered on several data sets from UCI Repository. We construct fixed number of base classifiers over selected feature subsets and refine the ensemble iter…
Texture analysis with statistical methods for wheat ear extraction
2007
In agronomic domain, the simplification of crop counting, necessary for yield prediction and agronomic studies, is an important project for technical institutes such as Arvalis. Although the main objective of our global project is to conceive a mobile robot for natural image acquisition directly in a field, Arvalis has proposed us first to detect by image processing the number of wheat ears in images before to count them, which will allow to obtain the first component of the yield. In this paper we compare different texture image segmentation techniques based on feature extraction by first and higher order statistical methods which have been applied on our images. The extracted features are…
A relational model for unstructured documents
1987
The logical structure of a document is usually a tree in which the order of the nodes is important at least at some level of the tree. We call a document unstructured if its structure is a single-level ordered tree. The purpose of this paper is to present a many-sorted algebra for handling unstructured documents. The documents in the model are represented by relations. An algebra for handling documents of one type can be extended to an algebra for handling documents of several types. Further, an algebra for handling documents can be extended by the relational algebra for handling documents and relations in a common algebra. The model of this paper can be regarded as a part of a general docu…
Visualization of Large Terrain Using Non-restricted Quadtree Triangulations
2004
This paper presents a set of new techniques oriented towards the real-time visualization of large terrains. These techniques are mainly focused on semi-regular triangulations of non-restricted quadtree terrain representations. Despite the fact that the paper shows that triangulations based on non-restricted quadtrees are as simple and efficient as those based on restricted quadtrees, the new triangulations avoid discontinuity problems among the boundaries of different patches without the need for tree balancing and extra triangles addition. Another important feature of the proposed triangulation is that it incorporates an efficient method for building triangle strips and triangle fans for t…
Driver Situation Awareness and Perceived Sleepiness during Truck Platoon Driving–Insights from Eye-tracking Data
2021
Truck platoon driving technology uses vehicle-to-vehicle communication to allow one truck to follow another in an automated fashion. The first vehicle is operated manually, the second vehicle is dr...