0000000001074498

AUTHOR

Tommi Jauhiainen

0000-0002-6474-3570

showing 1 related works from this author

The International Comparable Corpus: Challenges in building multilingual spoken and written comparable corpora

2021

This paper reports on the efforts of twelve national teams in building the International Comparable Corpus (ICC; https://korpus.cz/icc) that will contain highly comparable datasets of spoken, written and electronic registers. The languages currently covered are Czech, Finnish, French, German, Irish, Italian, Norwegian, Polish, Slovak, Swedish and, more recently, Chinese, as well as English, which is considered to be the pivot language. The goal of the project is to provide much-needed data for contrastive corpus-based linguistics. The ICC corpus is committed to the idea of re-using existing multilingual resources as much as possible and the design is modelled, with various adjustments, on t…

Czech050101 languages & linguisticsHistorycontrastive linguisticsGermanIrish6121 Languages0501 psychology and cognitive sciencesGeneral Materials Sciencedata sustainabilityContrastive linguisticskielitiedevertaileva kielitiedeICC corpus05 social sciencescopyright050301 educationICE corpuskontrastiivinen tutkimus113 Computer and information scienceslanguage.human_languageLinguisticstekijänoikeusPivot languageInternational Corpus of EnglishlanguagekorpuksetWritten language0503 educationcomparable corpusSpoken language
researchProduct