6533b7d8fe1ef96bd126abcd

RESEARCH PRODUCT

A Comparison of Language Identification Approaches on Short, Query-Style Texts

Thomas GottronNedim Lipka

subject

Information retrievalLanguage identificationComputer sciencebusiness.industryArtificial intelligencecomputer.software_genrebusinesscomputerNatural language processingStyle (sociolinguistics)

description

In a multi-language Information Retrieval setting, the knowledge about the language of a user query is important for further processing. Hence, we compare the performance of some typical approaches for language detection on very short, query-style texts. The results show that already for single words an accuracy of more than 80% can be achieved, for slightly longer texts we even observed accuracy values close to 100%.

https://doi.org/10.1007/978-3-642-12275-0_59