0000000000710210

AUTHOR

Toshiaki Nakazawa

showing 2 related works from this author

Designing the Business Conversation Corpus

2020

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benef…

FOS: Computer and information sciences050101 languages & linguisticsComputer Science - Computation and LanguageMachine translationComputer sciencebusiness.industrymedia_common.quotation_subject05 social sciencesAutomatic translation02 engineering and technologycomputer.software_genre0202 electrical engineering electronic engineering information engineeringComputingMethodologies_DOCUMENTANDTEXTPROCESSING020201 artificial intelligence & image processing0501 psychology and cognitive sciencesConversationQuality (business)Artificial intelligencebusinesscomputerComputation and Language (cs.CL)Natural language processingmedia_common
researchProduct

New Areas of Application of Comparable Corpora

2019

This chapter describes several approaches of using comparable corpora beyond the area of MT for under-resourced languages, which is the primary focus of the ACCURAT project. Section 7.1, which is based on Rapp and Zock (Automatic dictionary expansion using non-parallel corpora. In: A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in Data Analysis, Data Handling and Business Intelligence. Proceedings of the 32nd Annual Meeting of the GfKl, 2008. Springer, Heidelberg, 2010), addresses the task of creating resources for bilingual dictionaries using a seed lexicon; Sect. 7.2 (based on Rapp et al., Identifying word translations from comparable documents without a seed lexicon. Proceedi…

business.industryComputer scienceGroup method of data handlingSection (typography)020207 software engineering02 engineering and technology[SCCO.LING]Cognitive science/LinguisticsLexiconcomputer.software_genreFocus (linguistics)Task (project management)[SCCO]Cognitive scienceBusiness intelligence0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]Artificial intelligencebusinesscomputerComputingMilieux_MISCELLANEOUSNatural language processingWord (computer architecture)
researchProduct