Search results for "information extraction"
showing 10 items of 25 documents
Diversity in random subspacing ensembles
2004
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…
The HisClima database: historical weather logs for automatic transcription and information extraction
2021
Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. This paper presents a database for researching about the ca…
Some Experiments in Supervised Pattern Recognition with Incomplete Training Samples
2002
This paper presents some ideas about automatic procedures to implement a system with the capability of detecting patterns arising from classes not represented in the training sample. The procedure aims at incorporating automatically to the training sample the necessary information about the new class for correctly recognizing patterns from this class in future classification tasks. The Nearest Neighbor rule is employed as the central classifier and several techniques are added to cope with the peril of incorporating noisy data to the training sample. Experimental results with real data confirm the benefits of the proposed procedure.
Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language
2016
Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.
FrameNet CNL: A Knowledge Representation and Information Extraction Language
2014
The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that Fram…
Embedded controlled language to facilitate information extraction from eGov policies
2015
The goal of this paper is to propose a system that can extract formal semantic knowledge representation from natural language eGov policies. We present an architecture that allows for extracting Controlled Natural Language (CNL) statements from heterogeneous natural language texts with the ability to support multilinguality. The approach is based on the concept of embedded CNLs.
Cohesive explicitness and explicitation in an English-German translation corpus
2007
Explicitness or implicitness as assumed properties of translated texts and other texts in multilingual communication have for some time been the object of speculation and, at a later stage, of more systematic research in linguistics and translation studies. This paper undertakes an investigation of explicitness/implicitness and related phenomena of translated texts on the level of cohesion. A corpus-based research architecture, embedded in an empirical research methodology, will be outlined, and first results and possible explanations will be discussed. The paper starts with a terminological clarification of the concepts of ‘explicitness’ and ‘explicitation’ in terms of dependent variables …
Translingual text mining for identification of language pair phenomena
2016
Translingual Text Mining (TTM) is an innovative technology of natural language processing for building multilingual parallel corpora, processing machine translation, contextual knowledge acquisition, information extraction, query profiling, language modeling, contextual word sensing, creating feature test sets and for variety of other purposes. The Keynote Lecture will discuss opportunities and challenges of this computational technology. In particular, the focus will be made on identification of language pair phenomena and their applications to building holistic language model which is a novel tool for processing machine translation, supporting professional translations, evaluation of tran…
Extracting business information from graphs: An eye tracking experiment
2016
Information graphics are visualizations that convey information about data trends and distributions. Data visualization and the application of graphs is increasingly important in business decision making, for instance, in big data analysis. However, relatively little information exists about how people extract information from graphs and how the framing of the graphic design defines may ‘nudge’ and bias decision making. As a contribution to fill this gap, this study applies the methodology of experimental economics to the analysis of graph reading and processing to extract underlying information. Specifically, the study presents the results of an experiment whose baseline treatment includes…
Extraction of Medical Terms for Word Sense Disambiguation within Multilingual Framework
2016
All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limite…