Toms Bergmanis

Robust Neural Machine Translation: Modeling Orthographic and Interpunctual Variation

Neural machine translation systems typically are trained on curated corpora and break when faced with non-standard orthography or punctuation. Resilience to spelling mistakes and typos, however, is crucial as machine translation systems are used to translate texts of informal origins, such as chat conversations, social media posts and web pages. We propose a simple generative noise model to generate adversarial examples of ten different types. We use these to augment machine translation systems’ training data and show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data, while baseline systems’ performance drops by…

research product

Facilitating terminology translation with target lemma annotations

Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence. In day-to-day work of professional translators, however, it is seldom the case as translators work with bilingual glossaries where terms are given in their dictionary forms; finding the right target language form is part of the translation process. We argue that the requirement for apriori specified target language forms is unrealistic and impedes the practical applicability of previous work. In this work, we propose to train machine translation systems using a source-side data augmentatio…

research product

Tīmekļa bāzētas video glabātuves un straumēšanas sistēmas izstrāde

Kvalifikācijas darba mērķis ir tīmekļa bāzētas video glabātuves un straumēšanas sistēmas izstrāde. Sistēmas uzdevums ir dot iespēju uzglabāt un aplūkot video materiālus, veidot video materiāliem piekārtotas diskusijas, šķirot un vērtēt video materiālus. Kvalifikācijas darbs kļūs par pamatu turpmākai video mācību materiālu glabātuves izstrādei LU Matemātikas un informātikas institūta vajadzībām, tādēļ kvalifikācijas darba uzdevums ir, ievērojot Latvijas Valsts standartus programminženierijā un vadoties pēc indistrijas labās prakses, radīt modulāru, mērogojamu un viegli modificējamu programmproduktu.

research product