6533b85dfe1ef96bd12bea1b
RESEARCH PRODUCT
New Areas of Application of Comparable Corpora
Bogdan BabychMichael ZockSadao KurohashiRichard S. ForsythVivian XuSerge SharoffChenhui ChuToshiaki NakazawaReinhard Rappsubject
business.industryComputer scienceGroup method of data handlingSection (typography)020207 software engineering02 engineering and technology[SCCO.LING]Cognitive science/LinguisticsLexiconcomputer.software_genreFocus (linguistics)Task (project management)[SCCO]Cognitive scienceBusiness intelligence0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]Artificial intelligencebusinesscomputerComputingMilieux_MISCELLANEOUSNatural language processingWord (computer architecture)description
This chapter describes several approaches of using comparable corpora beyond the area of MT for under-resourced languages, which is the primary focus of the ACCURAT project. Section 7.1, which is based on Rapp and Zock (Automatic dictionary expansion using non-parallel corpora. In: A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in Data Analysis, Data Handling and Business Intelligence. Proceedings of the 32nd Annual Meeting of the GfKl, 2008. Springer, Heidelberg, 2010), addresses the task of creating resources for bilingual dictionaries using a seed lexicon; Sect. 7.2 (based on Rapp et al., Identifying word translations from comparable documents without a seed lexicon. Proceedings of LREC 2012, Istanbul, 2012) develops and evaluates a novel methodology of creating bilingual dictionaries without an initial lexicon. Section 7.3 proposes a novel system that can extract Chinese–Japanese parallel sentences from quasi-comparable and comparable corpora.
year | journal | country | edition | language |
---|---|---|---|---|
2019-01-01 |