6533b871fe1ef96bd12d0d87

RESEARCH PRODUCT

Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank

Baiba SaulīteLauma PretkalniņaLaura Rituma

subject

060201 languages & linguisticsDependency (UML)GrammarComputer sciencebusiness.industrymedia_common.quotation_subjectTreebankLatvian06 humanities and the arts02 engineering and technologycomputer.software_genreSyntaxlanguage.human_languageDependency grammar0602 languages and literature0202 electrical engineering electronic engineering information engineeringlanguage020201 artificial intelligence & image processingArtificial intelligencebusinessRepresentation (mathematics)computerNatural language processingmedia_commonDe facto standard

description

The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural language processing in that language. In this paper, we describe how we created a UD treebank for Latvian, and how we obtained both the basic and enhanced UD representations from the data in Latvian Treebank which is annotated according to a hybrid dependency-constituency grammar model. The hybrid model was inspired by Lucien Tesniere’s dependency grammar theory and its notion of a syntactic nucleus. While the basic UD representation is already a de facto standard in NLP, the enhanced UD representation is just emerging, and the treebank described here is among the first to provide both representations.

https://doi.org/10.1007/978-3-030-00794-2_10