6533b856fe1ef96bd12b2519
RESEARCH PRODUCT
Automatic identification of word translations from unrelated English and German corpora
Reinhard Rappsubject
Computer sciencebusiness.industrycomputer.software_genrelanguage.human_languageLinguisticsTask (project management)GermanBilingual lexiconIdentification (information)ComputingMethodologies_DOCUMENTANDTEXTPROCESSINGlanguageArtificial intelligencebusinesscomputerNatural language processingWord (computer architecture)description
Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies up to 99% of the word alignments have been shown to be correct, the accuracy for non-parallel texts has been around 30% up to now. The current study, which is based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.
year | journal | country | edition | language |
---|---|---|---|---|
1999-01-01 | Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics - |