6533b82afe1ef96bd128ca2f

RESEARCH PRODUCT

Automatic Dictionary Creation by Sub-symbolic Encoding of Words

Salvatore GaglioGiovanni PilatoFilippo VellaIgnazio Motisi

subject

Text corpusCorrectnessProbabilistic latent semantic analysisComputer scienceLatent semantic analysisbusiness.industryContext (language use)Translation (geometry)computer.software_genreFeature (linguistics)Artificial intelligencebusinessRepresentation (mathematics)computerNatural language processing

description

This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.

https://doi.org/10.1007/11731177_17