Search results for "computer.software_genre"
showing 10 items of 3858 documents
Wordnet and semidiscrete decomposition for sub-symbolic representation of words
2009
A methodology for sub-symbolic semantic encoding of words is presented. The methodology uses the standard, semantically highly-structured WordNet lexical database and the SemiDiscrete matrix Decomposition to obtain a vector representation with low memory requirements in a semantic n-space. The application of the proposed algorithm over all the WordNet words would lead to a useful tool for the sub-symbolic processing of texts.
Combining content extraction heuristics
2008
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Content Extraction (CE) is the task to identify and extract the main content. Ongoing research has spawned several CE heuristics of different quality. However, so far only the Crunch framework combines several heuristics to improve its overall CE performance. Since Crunch, though, many new algorithms have been formulated. The CombinE system is designed to test, evaluate and optimise combinations of CE heuristics. Its aim is to develop CE systems which yield better and more reliable extracts of the main content of a web …
A Novel Approach to Improve the Accuracy of Web Retrieval
2010
General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents and user queries. Nouns and verbs in the WordNet are organized in the tree hierarchies. The word meanings are presented by numbers that reference to the nodes on the semantic tree. The meaning of each word in the sentence is calculated when the sentence is analyzed. The goal is to put each nou…
Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language
2016
Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.
Semantic retrieval: an approach to representing, searching and summarising text documents
2011
Nowadays, the internet is the major source of information for millions of people. There are many search tools available on the net but finding appropriate text information is still difficult. The retrieval efficiency of the presently used systems cannot be significantly improved: ‘bag of words’ interpretation causes losing semantics of texts. We applied the functional approach to represent English text documents. It allows taking into account semantic relations between words when indexing documents and use ordinary English sentences as queries to a search engine. The proposed retrieval mechanisms return only highly relevant documents. They make it possible to generate content-aware summarie…
Heuristic Method to Improve Systematic Collection of Terminology
2016
In this paper, we propose an experimental tool for analysis and graphical representation of glossaries. The original heuristic algorithms and analysis methods incorporated into the tool appeared to be useful to improve the quality of the glossaries. The tool was used for analysis of ISTQB Standard Glossary of Terms Used in Software Testing. There are instances of problems found in ISTQB glossary related to its consistency, completeness, and correctness described in the paper.
Combining OWL ontologies usingE-Connections
2006
The standardization of the Web Ontology Language (OWL) leaves (at least) two crucial issues for Web-based ontologies unsatisfactorily resolved, namely how to represent and reason with multiple distinct, but linked ontologies, and how to enable effective knowledge reuse and sharing on the Semantic Web. In this paper, we present a solution for these fundamental problems based on E-Connections. We aim to use E-Connections to provide modelers with suitable means for developing Web ontologies in a modular way and to provide an alternative to the owl:imports construct. With such motivation, we present in this paper a syntactic and semantic extension of the Web Ontology language that covers E-Conn…
Contextual Metadata for Document Databases
2005
Metadata has always been an important means to support accessibility of information in document collections. Metadata can be, for example, bibliographic data manually created for each document at the time of document storage. The indexes created by Web search engines serve as metadata about the content of Web documents. In the semantic Web solutions, ontologies are used to store semantic metadata (Berners-Lee et al., 2001). Attaching a common ontology to a set of heterogeneous document databases may be used to support data integration. Creation of the common ontology requires profound understanding of the concepts used in the databases. It is a demanding task, especially in cases where the …
From Databases to Ontologies
2009
This chapter introduces the UML profile for OWL as an essential instrument for bridging the gap between the legacy relational databases and OWL ontologies. We address one of the long-standing relational database design problems where initial conceptual model (a semantically clear domain conceptualization ontology) gets “lost” during conversion into the normalized database schema. The problem is that such “loss” makes database inaccessible for direct query by domain experts familiar with the conceptual model only. This problem can be avoided by exporting the database into RDF according to the original conceptual model (OWL ontology) and formulating semantically clear queries in SPARQL over t…
Self-service Ad-hoc Querying Using Controlled Natural Language
2016
The ad-hoc querying process is slow and error prone due to inability of business experts of accessing data directly without involving IT experts. The problem lies in complexity of means used to query data. We propose a new natural language- and semistar ontology-based ad-hoc querying approach which lowers the steep learning curve required to be able to query data. The proposed approach would significantly shorten the time needed to master the ad-hoc querying and to gain the direct access to data by business experts, thus facilitating the decision making process in enterprises, government institutions and other organizations.