Search results for "Extraction"
showing 10 items of 2072 documents
The Anatomy of an Optical Biopsy Semantic Retrieval System
2012
A case-based computer-aided diagnosis system assists physicians and other medical personnel in the interpretation of optical biopsies obtained through confocal laser endomicroscopy. Extraction in CLE images shows promising results on inferring semantic metadata from low-level features. In order to effectively ensure the interoperability with potential third-party applications, the system provides an interface compliant with the recent standards ISO/IEC 15938-12:2008 (MPEG Query Format) and ISO/IEC 24800 (JPEG Search).
Content Code Blurring: A New Approach to Content Extraction
2008
Most HTML documents on the world wide web contain far more than the article or text which forms their main content. Navigation menus, functional and design elements or commercial banners are typical examples of additional contents. Content extraction is the process of identifying the main content and/or removing the additional contents. We introduce content code blurring, a novel content extraction algorithm. As the main text content is typically a long, homogeneously formatted region in a web document, the aim is to identify exactly these regions in an iterative process. Comparing its performance with existing content extraction solutions we show thatfor most documents content code blurrin…
Semantic web service discovery system for road traffic information services
2015
Create a multi-agent platform for a traveller information system (FIPA standards).Extend Paulucci algorithm with the use of seven similarity measures.Weight the similarity measure according to semantic relation and parameter nature.Improved running-time with a filtering pre-process for non-functional parameters.Improved the recall by measuring the sibling relationship concepts. We describe a multi-agent platform for a traveller information system, allowing travellers to find the road traffic information web service (WSs) that best fits their requirements. After studying existing proposals for discovery of semantic WS, we implemented a hybrid matching algorithm, which is described in detail …
Combining content extraction heuristics
2008
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Content Extraction (CE) is the task to identify and extract the main content. Ongoing research has spawned several CE heuristics of different quality. However, so far only the Crunch framework combines several heuristics to improve its overall CE performance. Since Crunch, though, many new algorithms have been formulated. The CombinE system is designed to test, evaluate and optimise combinations of CE heuristics. Its aim is to develop CE systems which yield better and more reliable extracts of the main content of a web …
Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language
2016
Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.
Machine Learning and Knowledge Discovery in Databases. Research Track
2021
FrameNet CNL: A Knowledge Representation and Information Extraction Language
2014
The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that Fram…
An interactive evolutionary approach for content based image retrieval
2009
Content Based Image Retrieval (CBIR) systems aim to provide a means to find pictures in large repositories without using any other information except its contents usually as low-level descriptors. Since these descriptors do not exactly match the high level semantics of the image, assessing perceptual similarity between two pictures using only their feature vectors is not a trivial task. In fact, the ability of a system to induce high level semantic concepts from the feature vector of an image is one of the aspects which most influences its performance. This paper describes a CBIR algorithm which combines relevance feedback, evolutionary computation concepts and ad-hoc strategies in an attem…
Estimating web site readability using content extraction
2009
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality of an information. We show that modern content extraction algorithms help to estimate the readability of a web document quite accurate.
On the Ion-Pair Recognition and Indication Features of a Fluorescent Heteroditopic Host Based on a BODIPY Core
2014
A fluorescent heteroditopic host for ion pairs and zwitterionic species has been synthesized. Its affinity towards a series of anions, cations and ion pairs in acetonitrile has been assessed, and the spectroscopic response has been evaluated. Solid–liquid extraction experiments of inorganic salts, α-amino acids and γ-aminobutyric acid (GABA) into acetonitrile solutions were performed, and the resulting complexes were analyzed by UV/Vis absorption, fluorescence and 1H NMR spectroscopy. The discrimination patterns observed have been rationalized in terms of the molecular topologies of the host and guests.