0000000000405511
AUTHOR
Frankie Robertson
Filling the ___-s in Finnish MWE lexicons
This paper describes the automatic construction of FinnMWE: a lexicon of Finnish Multi-Word Expressions (MWEs). In focus here are syntactic frames: verbal constructions with arguments in a particular morphological form. The verbal frames are automatically extracted from FinnWordNet and English Wiktionary. The resulting lexicon interoperates with dependency tree searching software so that instances can be quickly found within dependency treebanks. The extraction and enrichment process is explained in detail. The resulting resource is evaluated in terms of its coverage of different types of MWEs. It is also compared with and evaluated against Finnish PropBank. peerReviewed
Morphological parsing with lexical transducers : a case study of OMorFi
This thesis explores the task of morphological parsing, which is going from a written word to a representation of the units of meaning making up the word. The research objective is to investigate morphological parsing of Finnish with lexical transducers through a case study of OMorFi (Open Morphology for Finnish). The thesis also presents some linguistic and mathematical background as well as some techniques for constructing FSTs (Finite-State Transducers). The main results are an exposition and some analysis of OMorFi’s paradigms, stubs & stems language model, some comparison with related work and ideas for potential future work.
A Contrastive Evaluation of Word Sense Disambiguation Systems for Finnish
Aiempi saneiden alamerkitysten yksiselitteistämistä käsittelevä työ, kuten monet muut luonnollisen kielen käsittelyyn liittyvät tehtävät, on enimmäkseen keskittynyt englannin kieleen. Vaikka hieman työtä on tehty myös muilla kielillä, mukaan lukien uralilaiset kielet, vertailevaa arviointia suomen kielen saneiden alamerkitysten yksiselitteistämisestä ei ole tähän mennessä julkaistu huolimatta siitä, että tarvittavat leksikaaliset resurssit, erityisesti FinnWordNet, ovat jo pitkään olleet saatavilla. Tämä työ pyrkii korjaamaan tilanteen. Se tarjoaa tuloksia merkittävimpiä lähestymistapoja saneiden alamerkitysten yksiselitteistämiseen edustavista ohjelmista, sisältäen joitakin parhaiten engla…
Show, Don't Tell : Visualising Finnish Word Formation in a Browser-Based Reading Assistant
This paper presents the NiinMikaOli?! reading assistant for Finnish. The focus is upon the simplified presentation and visualisation of a wide range of word-level linguistic phenomena of the Finnish language in a unified form so as to benefit language learners. The system is available as a browser extension, intended to be used in-context, with authentic texts, in order to encourage free reading in language learners. peerReviewed
TallVocabL2Fi : A Tall Dataset of 15 Finnish L2 Learners’ Vocabulary
Previous work concerning measurement of second language learners has tended to focus on the knowledge of small numbers of words, often geared towards measuring vocabulary size. This paper presents a “tall” dataset containing information about a few learners’ knowledge of many words, suitable for evaluating Vocabulary Inventory Prediction (VIP) techniques, including those based on Computerised Adaptive Testing (CAT). In comparison to previous comparable datasets, the learners are from varied backgrounds, so as to reduce the risk of overfitting when used for machine learning based VIP. The dataset contains both a self-rating test and a translation test, used to derive a measure of reliability…
A COVID-19 news coverage mood map of Europe
We present a COVID-19 news dashboard which visualizes sentiment in pandemic news coverage in different languages across Europe. The dashboard shows analyses for positive/neutral/negative sentiment and moral sentiment for news articles across countries and languages. First we extract news articles from news-crawl. Then we use a pre-trained multilingual BERT model for sentiment analysis of news article headlines and a dictionary and word vectors -based method for moral sentiment analysis of news articles. The resulting dashboard gives a unified overview of news events on COVID-19 news overall sentiment, and the region and language of publication from the period starting from the beginning of …