0000000000273679

AUTHOR

Daniel Morariu

On Hagelbarger’s and Shannon’s matching pennies playing machines

Abstract In the 1950s, Hagelbarger’s Sequence Extrapolating Robot (SEER) and Shannon’s Mind-Reading Machine (MRM) were the state-of-the-art research results in playing the well-known “matching pennies” game. In our research we perform a software implementation for both machines in order to test the common statement that MRM, even simpler, beats SEER. Also, we propose a simple contextual predictor (SCP) and use it to compete with SEER and MRM. As expected, experimental results proves the claimed MRM superiority over SEER and even the SCP’s superiority over both SEER and MRM. At the end, we draw some conclusions and propose further research ideas, like the use of mixing models methods and the…

research product

DBSCAN Algorithm for Document Clustering

Abstract Document clustering is a problem of automatically grouping similar document into categories based on some similarity metrics. Almost all available data, usually on the web, are unclassified so we need powerful clustering algorithms that work with these types of data. All common search engines return a list of pages relevant to the user query. This list needs to be generated fast and as correct as possible. For this type of problems, because the web pages are unclassified, we need powerful clustering algorithms. In this paper we present a clustering algorithm called DBSCAN – Density-Based Spatial Clustering of Applications with Noise – and its limitations on documents (or web pages)…

research product

Part of Speech Tagging Using Hidden Markov Models

Abstract In this paper, we present a wide range of models based on less adaptive and adaptive approaches for a PoS tagging system. These parameters for the adaptive approach are based on the n-gram of the Hidden Markov Model, evaluated for bigram and trigram, and based on three different types of decoding method, in this case forward, backward, and bidirectional. We used the Brown Corpus for the training and the testing phase. The bidirectional trigram model almost reaches state of the art accuracy but is disadvantaged by the decoding speed time while the backward trigram reaches almost the same results with a way better decoding speed time. By these results, we can conclude that the decodi…

research product

Aspects Concerning SVM Method’s Scalability

In the last years the quantity of text documents is increasing continually and automatic document classification is an important challenge. In the text document classification the training step is essential in obtaining a good classifier. The quality of learning depends on the dimension of the training data. When working with huge learning data sets, problems regarding the training time that increases exponentially are occurring. In this paper we are presenting a method that allows working with huge data sets into the training step without increasing exponentially the training time and without significantly decreasing the classification accuracy.

research product

Weights Space Exploration Using Genetic Algorithms for Meta-classifier in Text Document Classification

research product

Part of speech tagging with Naïve Bayes methods

research product

An Extension of the VSM Documents Representation using Word Embedding

Abstract In this paper, we will present experiments that try to integrate the power of Word Embedding representation in real problems for documents classification. Word Embedding is a new tendency used in the natural language processing domain that tries to represent each word from the document in a vector format. This representation embeds the semantically context in that the word occurs more frequently. We include this new representation in a classical VSM document representation and evaluate it using a learning algorithm based on the Support Vector Machine. This new added information makes the classification to be more difficult because it increases the learning time and the memory neede…

research product

Khmer character recognition using artificial neural network

Character Recognition has become an interesting and a challenge topic research in the field of pattern recognition in recent decade. It has numerous applications including bank cheques, address sorting and conversion of handwritten or printed character into machine-readable form. Artificial neural network including self-organization map and multilayer perceptron network with the learning ability could offer the solution to character recognition problem. In this paper presents Khmer Character Recognition (KCR) system implemented in Matlab environment using artificial neural networks. The KCR system described the utilization of integrated self-organization map (SOM) network and multilayer per…

research product

Part-of-speech labeling for Reuters database

Even if the Vector Space Model used for document representation in information retrieval systems integrates a small quantity of knowledge it continues to be used due to its computational cost, speed execution and simplicity. We try to improve this document representation by adding some syntactic information such as the parts of speech. In this paper, we have evaluated three different tagging algorithms in order to select the most suitable tagger for using it to tag the Reuters dataset. In this work, we have evaluated the taggers using only five different parts of speech: noun, verb, adverb, adjective and others. We considered these particular tags being the most representative for describin…

research product