6533b7dafe1ef96bd126f5de
RESEARCH PRODUCT
Extraction of Medical Terms for Word Sense Disambiguation within Multilingual Framework
Jolanta Mizera-pietraszkoTomasz Machalewskisubject
Medical terminologybusiness.industryComputer sciencesimilarity metricsContext (language use)02 engineering and technologycomputer.software_genreSemEvalTerminologycomputational linguisticsmultilingual information retrievalword sense disambiguation020204 information systemsSimilarity (psychology)0202 electrical engineering electronic engineering information engineeringmedical informatics020201 artificial intelligence & image processingCognateArtificial intelligenceinformation extractionLanguage familybusinesscomputerNatural language processingWord (computer architecture)description
All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limited to synonymy of the cognate technical terms. Our approach is investigated based on Boolean and Free Text Search modes on the comparison basis. For measuring the efficiency of our methodology we use the classical Salton model of tf-idf term weighting schemes, however extended by Karen Sparck Jones. Our results are very promising since they indicate that similarity between the synonymous words being English medical terms and their target language equivalents enables significant limitation of the target word senses even those outside the language family like e.g. for the English and Polish language pair phenomena. Such a limitation of the number of target word senses results in better disambiguation and is more context-driven. Also, consequently it translates onto the higher precision in multilingual medical information retrieval.
year | journal | country | edition | language |
---|---|---|---|---|
2016-08-01 |