0000000000749359
AUTHOR
Aleksander Stensby
THE USE OF WEAK ESTIMATORS TO ACHIEVE LANGUAGE DETECTION AND TRACKING IN MULTILINGUAL DOCUMENTS
This paper deals with the problems of language detection and tracking in multilingual online short word-of-mouth (WoM) discussions. This problem is particularly unusual and difficult from a pattern recognition perspective because, in these discussions, the participants and content involve the opinions of users from all over the world. The nature of these discussions, consisting of multiple topics in different languages, presents us with a problem of finding training and classification strategies when the class-conditional distributions are nonstationary. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the…
Language Detection and Tracking in Multilingual Documents Using Weak Estimators
Published version of an article from the book: Structural, Syntactic, and Statistical Pattern Recognition . The original publication is available at Spingerlink. http://dx.doi.org/DOI: 10.1007/978-3-642-14980-1_59 This paper deals with the extremely complicated problem of language detection and tracking in real-life electronic (for example, in Word-of-Mouth (WoM)) applications, where various segments of the text are written in different languages. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the next starts. Further, the features which one uses for any one language (for example, the n-grams) will not be…