Toward Approximate GML Retrieval Based on Structural and Semantic Characteristics
International audience; GML is emerging as the new standard for representing geographic information in GISs on the Web, allowing the encoding of structurally and semantically rich geographic data in self describing XML-based geographic entities. In this study, we address the problem of approximate querying and ranked results for GML data and provide a method for GML query evaluation. Our method consists of two main contributions. First, we propose a tree model for representing GML queries and data collections. Then, we introduce a GML retrieval method based on the concept of tree edit distance as an efficient means for comparing semi-structured data. Our approach allows the evaluation of bo…
Towards semantic-based RSS merging
Merging information can be of key importance in several XML-based applications. For instance, merging the RSS news from different sources and providers can be beneficial for end-users (journalists, economists, etc.) in various scenarios. In this work, we address this issue and mainly explore the relatedness relationships between RSS entities/ elements. To validate our approach, we also provide a set of experimental tests showing satisfactory results. © 2009 Springer-Verlag Berlin Heidelberg
Semantic-based Merging of RSS Items
Merging XML documents can be of key importance in several applications. For instance, merging the RSS news from same or different sources and providers can be beneficial for end-users in various scenarios. In this paper, we address this issue and explore the relatedness measure between RSS elements. We show here how to define and compute exclusive relations between any two elements and provide several predefined merging operators that can be extended and adapted to human needs. We also provide a set of experiments conducted to validate our approach. © Springer Science+Business Media, LLC 2009.
Building Semantic Trees from XML Documents
International audience; The distributed nature of the Web, as a decentralized system exchanging information between heterogeneous sources, has underlined the need to manage interoperability, i.e., the ability to automatically interpret information in Web documents exchanged between different sources, necessary for efficient information management and search applications. In this context, XML was introduced as a data representation standard that simplifies the tasks of interoperation and integration among heterogeneous data sources, allowing to represent data in (semi-) structured documents consisting of hierarchically nested elements and atomic attributes. However, while XML was shown most …
Relating RSS News/Items
Merging related RSS news (coming from one or different sources) is beneficial for end-users with different backgrounds (journalists, economists, etc.), particularly those accessing similar information. In this paper, we provide a practical approach to both: measure the relatedness, and identify relationships between RSS elements. Our approach is based on the concepts of semantic neighborhood and vector space model, and considers the content and structure of RSS news items. © 2009 Springer Berlin Heidelberg.
Extensible User-Based XML Grammar Matching
International audience; XML grammar matching has found considerable interest recently due to the growing number of heterogeneous XML documents on the web and the increasing need to integrate, and consequently search and retrieve XML data originated from different data sources. In this paper, we provide an approach for automatic XML grammar matching and comparison aiming to minimize the amount of user effort required to perform the match task. We propose an open framework based on the concept of tree edit distance, integrating different matching criterions so as to capture XML grammar element semantic and syntactic similarities, cardinality and alternativeness constraints, as well as data-ty…
An overview on XML similarity: Background, current trends and future directions
In recent years, XML has been established as a major means for information management, and has been broadly utilized for complex data representation (e.g. multimedia objects). Owing to an unparalleled increasing use of the XML standard, developing efficient techniques for comparing XML-based documents becomes essential in the database and information retrieval communities. In this paper, we provide an overview of XML similarity/comparison by presenting existing research related to XML similarity. We also detail the possible applications of XML comparison processes in various fields, ranging over data warehousing, data integration, classification/clustering and XML querying, and discuss some…
Track and Workshop Program Chair Messages
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics
International audience; XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficient…
XML document-grammar comparison: related problems and applications
10.2478/s13537-011-0005-1; International audience; XML document comparison is becoming an ever more popular research issue due to the increasingly abundant use of XML. Likewise, a growing interest fosters the development of XML grammar matching and comparison, due to the proliferation of heterogeneous XML data sources, particularly on the Web. Nonetheless, the process of comparing XML documents with XML grammars, i.e., XML document and grammar similarity evaluation, has not yet received the attention it deserves. In this paper, we provide an overview on existing research related to XML document/grammar comparison, presenting the background and discussing the various techniques related to th…
Semantic to intelligent web era
International audience; The Web has known a very fast evolution: going from the Web 1.0, known as Web of Documents where users are merely consumers of static information, to the more dynamic Web 2.0, known as social or collaborative Web where users produce and consume information simultaneously, and entering the more sophisticated Web 3.0, known as the Semantic Web by giving information a well-defined meaning so that it becomes more easily accessible by human users and automated processes. Fostering service intelligence and atomicity (the ability of autonomous services to interact automatically), remains one of the most upcoming challenges of the Semantic Web. This promotes the dawn of a ne…