6533b82dfe1ef96bd1290914

RESEARCH PRODUCT

Designing the Business Conversation Corpus

Tong LiToshiaki NakazawaMatīss RiktersRyokan Ri

subject

FOS: Computer and information sciences050101 languages & linguisticsComputer Science - Computation and LanguageMachine translationComputer sciencebusiness.industrymedia_common.quotation_subject05 social sciencesAutomatic translation02 engineering and technologycomputer.software_genre0202 electrical engineering electronic engineering information engineeringComputingMethodologies_DOCUMENTANDTEXTPROCESSING020201 artificial intelligence & image processing0501 psychology and cognitive sciencesConversationQuality (business)Artificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingmedia_common

description

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.

https://dx.doi.org/10.48550/arxiv.2008.01940