Search results for "document."
showing 10 items of 1530 documents
Graphical information models as interfaces for Web document repositories
2000
In interorganisational processes, documents are used to record information created during the processes. Legislative processes involving several legislative organisations, or manufacturing processes involving complicated networks of companies and officials are examples of such processes. In the contemporary computerised environments a great deal of the recorded information is scattered in different kinds of Web repositories with different kinds of interfaces. The repositories should serve as valuable knowledge assets but their use may be difficult and even the knowledge about the kinds of repositories available may be insufficient. The paper presents a method for improving information manag…
ExtMiner : Combining multiple ranking and clustering algorithms for structured document retrieval
2006
This paper introduces ExtMiner, a platform and potential tool for information management in SMEs (small & medium-size enterprise), or for organizational workgroups. ExtMiner supports interactive and iterative clustering of documents. It provides users with a visual cluster and list views at the same time, supporting iterative search process. ExtMiner may also be applied as a platform for research on retrieval fusion, since it combines search, clustering and visualization algorithms. ExtMiner was evaluated with three document collections. Although the findings were encouraging the user interface and performance with large document repositories need further development. peerReviewed
A text based indexing system for mammographic image retrieval and classification
2014
Abstract In modern medical systems huge amount of text, words, images and videos are produced and stored in ad hoc databases. Medical community needs to extract precise information from that large amount of data. Currently ICT approaches do not provide a methodology for content-based medical images retrieval and classification. On the other hand, from the Internet of Things (IoT) perspective, the ICT medical data can be produced by several devices. Produced data complies with all Big Data features and constraints. The IoT guidelines put at the center of the system a new smart software to manage and transform Big Data in a new understanding form. This paper describes a text based indexing sy…
Schema-Based Visual Queries over Linked Data Endpoints
2019
We present the option to use the schema-based visual query tool ViziQuer over realistic Linked Data endpoints. We describe the tool meta-schema structure and the means for the endpoint schema retrieval both from an OWL ontology and from a SPARQL endpoint. We report on a store of the endpoint-specific schemas and the options to support the schema presentation to the end-user both as a class tree within the environment and as external visual diagram.
Text Extraction from Scrolling News Tickers
2020
While a lot of work exists on text or keyword extraction from videos, not a lot can be found on the exact problem of extracting continuous text from scrolling tickers. In this work a novel Tesseract OCR based pipeline is proposed for location and continuous text extraction from scrolling tickers in videos. The solution worked faster than real time, and achieved a character accuracy of 97.3% on 45 min of manually transcribed 360p videos of popular Latvian news shows.
A framework for context-sensitive metadata description
2006
Expectations regarding the new generation of Web depend on the success of Semantic Web technology. Resource Description Framework (RDF) is a basis for explicit and machine-readable representation of semantics. However RDF is not suitable for describing dynamic and context-sensitive resources (eg. processes). We present the Context Description Framework (CDF) as an extension of the RDF by adding a 'TrueInContext' component to the basic RDF triple ('subject-predicate-object'), and consider contextual value as a container of RDF statements. We also add a probabilistic component, which allows multilevel contextual dependence descriptions as well as presumes possibility for Bayesian reasoning wi…
Towards semantic-based RSS merging
2009
Merging information can be of key importance in several XML-based applications. For instance, merging the RSS news from different sources and providers can be beneficial for end-users (journalists, economists, etc.) in various scenarios. In this work, we address this issue and mainly explore the relatedness relationships between RSS entities/ elements. To validate our approach, we also provide a set of experimental tests showing satisfactory results. © 2009 Springer-Verlag Berlin Heidelberg
Combining content extraction heuristics
2008
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Content Extraction (CE) is the task to identify and extract the main content. Ongoing research has spawned several CE heuristics of different quality. However, so far only the Crunch framework combines several heuristics to improve its overall CE performance. Since Crunch, though, many new algorithms have been formulated. The CombinE system is designed to test, evaluate and optimise combinations of CE heuristics. Its aim is to develop CE systems which yield better and more reliable extracts of the main content of a web …
A Novel Approach to Improve the Accuracy of Web Retrieval
2010
General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents and user queries. Nouns and verbs in the WordNet are organized in the tree hierarchies. The word meanings are presented by numbers that reference to the nodes on the semantic tree. The meaning of each word in the sentence is calculated when the sentence is analyzed. The goal is to put each nou…
Contextual Metadata for Document Databases
2005
Metadata has always been an important means to support accessibility of information in document collections. Metadata can be, for example, bibliographic data manually created for each document at the time of document storage. The indexes created by Web search engines serve as metadata about the content of Web documents. In the semantic Web solutions, ontologies are used to store semantic metadata (Berners-Lee et al., 2001). Attaching a common ontology to a set of heterogeneous document databases may be used to support data integration. Creation of the common ontology requires profound understanding of the concepts used in the databases. It is a demanding task, especially in cases where the …