0000000000710210

AUTHOR

Toshiaki Nakazawa

Designing the Business Conversation Corpus

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benef…

research product

New Areas of Application of Comparable Corpora

This chapter describes several approaches of using comparable corpora beyond the area of MT for under-resourced languages, which is the primary focus of the ACCURAT project. Section 7.1, which is based on Rapp and Zock (Automatic dictionary expansion using non-parallel corpora. In: A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in Data Analysis, Data Handling and Business Intelligence. Proceedings of the 32nd Annual Meeting of the GfKl, 2008. Springer, Heidelberg, 2010), addresses the task of creating resources for bilingual dictionaries using a seed lexicon; Sect. 7.2 (based on Rapp et al., Identifying word translations from comparable documents without a seed lexicon. Proceedi…

research product