Search results for "Language Processing"
showing 10 items of 421 documents
Target frames in British hotel websites
2015
This article centres on four-word phrase frames in British hospitality websites. Our aim is to identify those frames that are specific to this website genre, which we call target frames. Each phrase frame represents an identical sequence of words except for one variable word, that is A*BC or AB*D. The words that fill the slot, marked with an asterisk, are called fillers. We used a corpus-driven approach using KfNgram software to identify the phrase frames in our corpus (COMETVAL). We regard phrase frames as genre-specific when they are significantly more frequent than those found in the written section of the BNC, which represents General British English. We further filtered our selection o…
Wordnet and semidiscrete decomposition for sub-symbolic representation of words
2009
A methodology for sub-symbolic semantic encoding of words is presented. The methodology uses the standard, semantically highly-structured WordNet lexical database and the SemiDiscrete matrix Decomposition to obtain a vector representation with low memory requirements in a semantic n-space. The application of the proposed algorithm over all the WordNet words would lead to a useful tool for the sub-symbolic processing of texts.
2020
To successfully learn using open Internet resources, students must be able to critically search, evaluate and select online information, and verify sources. Defined as critical online reasoning (COR), this construct is operationalized on two levels in our study: (1) the student level using the newly developed Critical Online Reasoning Assessment (CORA), and (2) the online information processing level using event log data, including gaze durations and fixations. The written responses of 32 students for one CORA task were scored by three independent raters. The resulting score was operationalized as “task performance,” whereas the gaze fixations and durations were defined as indicators of “pr…
A Graph-Grammar Approach to Represent Causal, Temporal and Other Contexts in an Oncological Patient Record
1996
AbstractThe data of a patient undergoing complex diagnostic and therapeutic procedures do not only form a simple chronology of events, but are closely related in many ways. Such data contexts include causal or temporal relationships, they express inconsistencies and revision processes, or describe patient-specific heuristics. The knowledge of data contexts supports the retrospective understanding of the medical decision-making process and is a valuable base for further treatment. Conventional data models usually neglect the problem of context knowledge, or simply use free text which is not processed by the program. In connection with the development of the knowledge-based system THEMPO (The…
Detecting Bridge Anaphora
2017
The paper presents one of most important issues in natural language processing (NLP), namely the automated recognition of semantic relations (in this case, bridge anaphora). In this sense, we propose to recognize automatically, as accurately as possible, this type of relations in a literary corpus (the novel Quo Vadis), knowing that the diversity and complexity of relations between entities is impressive. Furthermore, we defined and classified the bridge anaphora type relations based on annotation conventions. In order to achieve the main goal, we developed a computational instrument, BAT (Bridge Anaphora Tool), currently still in a test (and implicitly an improvable) version. This study is…
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
2018
The treebanks provided by the Universal Dependencies (UD) initiative are a state-of-the-art resource for cross-lingual and monolingual syntax-based linguistic studies, as well as for multilingual dependency parsing. Creating a UD treebank for a language helps further the UD initiative by providing an important dataset for research and natural language processing in that language. In this paper, we describe how we created a UD treebank for Latvian, and how we obtained both the basic and enhanced UD representations from the data in Latvian Treebank which is annotated according to a hybrid dependency-constituency grammar model. The hybrid model was inspired by Lucien Tesniere’s dependency gram…
Natural Language Processing Agents and Document Clustering in Knowledge Management
2008
While HTML provides the Web with a standard format for information presentation, XML has been made a standard for information structuring on the Web. The mission of the Semantic Web now is to provide meaning to the Web. Apart from building on the existing Web technologies, we need other tools from other areas of science to do that. This chapter shows how natural language processing methods and technologies, together with ontologies and a neural algorithm, can be used to help in the task of adding meaning to the Web, thus making the Web a better platform for knowledge management in general.
Riga: from FrameNet to Semantic Frames with C6.0 Rules
2015
For the purposes of SemEval-2015 Task-18 on the semantic dependency parsing we combined the best-performing closed track approach from the SemEval-2014 competition with state-of-the-art techniques for FrameNet semantic parsing. In the closed track our system ranked third for the semantic graph accuracy and first for exact labeled match of complete semantic graphs. These results can be attributed to the high accuracy of the C6.0 rule-based sense labeler adapted from the FrameNet parser. To handle large SemEval training data the C6.0 algorithm was extended to provide multi-class classification and to use fast greedy search without significant accuracy loss compared to exhaustive search. A met…
Language Detection and Tracking in Multilingual Documents Using Weak Estimators
2010
Published version of an article from the book: Structural, Syntactic, and Statistical Pattern Recognition . The original publication is available at Spingerlink. http://dx.doi.org/DOI: 10.1007/978-3-642-14980-1_59 This paper deals with the extremely complicated problem of language detection and tracking in real-life electronic (for example, in Word-of-Mouth (WoM)) applications, where various segments of the text are written in different languages. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the next starts. Further, the features which one uses for any one language (for example, the n-grams) will not be…
A Controllable Text Simplification System for the Italian Language
2021
Text simplification is a non-trivial task that aims at reducing the linguistic complexity of written texts. Researchers have studied the problem by proposing new methodologies for addressing the English language, but other languages, like the Italian one, are almost unexplored. In this paper, we give a contribution to the enhancement of the Automated Text Simplification research by presenting a deep learning-based system, inspired by a state of the art system for the English language, capable of simplifying Italian texts. The system has been trained and tested by leveraging the Italian version of Newsela; it has shown promising results by achieving a SARI value of 30.17.