Search results for "information extraction"

showing 10 items of 25 documents

Diversity in random subspacing ensembles

2004

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…

Computer sciencemedia_common.quotation_subjectAmbiguityEnsemble diversitycomputer.software_genreEnsemble learningData warehouseCorrelationInformation extractionKnowledge extractionStatisticsEntropy (information theory)Data miningcomputermedia_common
researchProduct

The HisClima database: historical weather logs for automatic transcription and information extraction

2021

Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. This paper presents a database for researching about the ca…

DatabaseComputer science05 social sciences050301 education02 engineering and technologyText recognitionAtmospheric modelcomputer.software_genreWind speedInformation extraction0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingTranscription (software)Baseline (configuration management)0503 educationRelevant informationcomputer2020 25th International Conference on Pattern Recognition (ICPR)
researchProduct

Some Experiments in Supervised Pattern Recognition with Incomplete Training Samples

2002

This paper presents some ideas about automatic procedures to implement a system with the capability of detecting patterns arising from classes not represented in the training sample. The procedure aims at incorporating automatically to the training sample the necessary information about the new class for correctly recognizing patterns from this class in future classification tasks. The Nearest Neighbor rule is employed as the central classifier and several techniques are added to cope with the peril of incorporating noisy data to the training sample. Experimental results with real data confirm the benefits of the proposed procedure.

Information extractionComputer sciencebusiness.industryAnomaly detectionPattern recognitionArtificial intelligencebusinessMachine learningcomputer.software_genreClassifier (UML)computerk-nearest neighbors algorithm
researchProduct

Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language

2016

Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.

Information retrievalConcept searchNoisy text analyticsbusiness.industryComputer scienceText simplification010401 analytical chemistryText graph02 engineering and technologycomputer.software_genre01 natural scienceslanguage.human_language0104 chemical sciencesInformation extractionControlled natural languageKnowledge extractionExplicit semantic analysis0202 electrical engineering electronic engineering information engineeringlanguage020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerNatural language processing2016 IEEE Tenth International Conference on Semantic Computing (ICSC)
researchProduct

FrameNet CNL: A Knowledge Representation and Information Extraction Language

2014

The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that Fram…

Information retrievalParsingKnowledge representation and reasoningbusiness.industryComputer scienceAgency (philosophy)computer.software_genreParaphraseInformation extractionArtificial intelligenceSource textFrameNetbusinesscomputerNatural language processingNatural language
researchProduct

Embedded controlled language to facilitate information extraction from eGov policies

2015

The goal of this paper is to propose a system that can extract formal semantic knowledge representation from natural language eGov policies. We present an architecture that allows for extracting Controlled Natural Language (CNL) statements from heterogeneous natural language texts with the ability to support multilinguality. The approach is based on the concept of embedded CNLs.

Language identificationNatural language user interfacebusiness.industryComputer scienceNatural language programmingcomputer.software_genrelanguage.human_languageInformation extractionUniversal Networking LanguageControlled natural languageQuestion answeringlanguageArtificial intelligencebusinesscomputerNatural language processingNatural languageProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services
researchProduct

Cohesive explicitness and explicitation in an English-German translation corpus

2007

Explicitness or implicitness as assumed properties of translated texts and other texts in multilingual communication have for some time been the object of speculation and, at a later stage, of more systematic research in linguistics and translation studies. This paper undertakes an investigation of explicitness/implicitness and related phenomena of translated texts on the level of cohesion. A corpus-based research architecture, embedded in an empirical research methodology, will be outlined, and first results and possible explanations will be discussed. The paper starts with a terminological clarification of the concepts of ‘explicitness’ and ‘explicitation’ in terms of dependent variables …

Linguistics and LanguageEmpirical dataVariablesmedia_common.quotation_subjectcomputer.software_genrelanguage.human_languageLinguisticsGermanCohesion (linguistics)Information extractionEmpirical researchlanguageTranslation studiesPsychologycomputermedia_commonLanguages in Contrast
researchProduct

Translingual text mining for identification of language pair phenomena

2016

Translingual Text Mining (TTM) is an innovative technology of natural language processing for building multilingual parallel corpora, processing machine translation, contextual knowledge acquisition, information extraction, query profiling, language modeling, contextual word sensing, creating feature test sets and for variety of other purposes. The Keynote Lecture will discuss opportunities and challenges of this computational technology. In particular, the focus will be made on identification of language pair phenomena and their applications to building holistic language model which is a novel tool for processing machine translation, supporting professional translations, evaluation of tran…

Machine translationLanguage identificationComputer sciencebusiness.industry05 social sciencessimilarity metrics02 engineering and technologycomputer.software_genre050105 experimental psychologycomputational linguisticsmultilingual information retrievalUniversal Networking LanguageCache language modelLanguage technology0202 electrical engineering electronic engineering information engineeringComputer-assisted translation020201 artificial intelligence & image processing0501 psychology and cognitive sciencesinformation extractionLanguage modelArtificial intelligencebusinesscomputerLanguage industryNatural language processing2016 Sixth International Conference on Innovative Computing Technology (INTECH)
researchProduct

Extracting business information from graphs: An eye tracking experiment

2016

Information graphics are visualizations that convey information about data trends and distributions. Data visualization and the application of graphs is increasingly important in business decision making, for instance, in big data analysis. However, relatively little information exists about how people extract information from graphs and how the framing of the graphic design defines may ‘nudge’ and bias decision making. As a contribution to fill this gap, this study applies the methodology of experimental economics to the analysis of graph reading and processing to extract underlying information. Specifically, the study presents the results of an experiment whose baseline treatment includes…

MarketingPower graph analysisBusiness informationInformation retrievalComputer sciencebusiness.industry05 social sciencesBig data020207 software engineering02 engineering and technologycomputer.software_genreVisualizationInformation extractionInformation visualizationData visualization0502 economics and businessStatistics0202 electrical engineering electronic engineering information engineeringGraphicsbusinesscomputer050203 business & managementJournal of Business Research
researchProduct

Extraction of Medical Terms for Word Sense Disambiguation within Multilingual Framework

2016

All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limite…

Medical terminologybusiness.industryComputer sciencesimilarity metricsContext (language use)02 engineering and technologycomputer.software_genreSemEvalTerminologycomputational linguisticsmultilingual information retrievalword sense disambiguation020204 information systemsSimilarity (psychology)0202 electrical engineering electronic engineering information engineeringmedical informatics020201 artificial intelligence & image processingCognateArtificial intelligenceinformation extractionLanguage familybusinesscomputerNatural language processingWord (computer architecture)
researchProduct