6533b82cfe1ef96bd128ed30

RESEARCH PRODUCT

Transforming XML documents to OWL ontologies: A survey

Mokhtaria HacheroufChristophe CruzSafia Nait Bahloul

subject

Document Structure Description[ INFO.INFO-IR ] Computer Science [cs]/Information Retrieval [cs.IR][ INFO.INFO-TT ] Computer Science [cs]/Document and Text ProcessingComputer scienceEfficient XML Interchange[ INFO.INFO-WB ] Computer Science [cs]/WebLibrary and Information SciencesOntology (information science)[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]XML Schema EditorStreaming XMLRELAX NG[ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]computer.programming_languageOWLInformation retrievalOntology[INFO.INFO-WB]Computer Science [cs]/WebACM[INFO.INFO-LO]Computer Science [cs]/Logic in Computer Science [cs.LO]Web Ontology LanguageXML validationcomputer.file_formatXML[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]ComputingMethodologies_DOCUMENTANDTEXTPROCESSING[ INFO.INFO-LO ] Computer Science [cs]/Logic in Computer Science [cs.LO]computerInformation Systems

description

The aims of XML data conversion to ontologies are the indexing, integration and enrichment of existing ontologies with knowledge acquired from these sources. The contribution of this paper consists in providing a classification of the approaches used for the conversion of XML documents into OWL ontologies. This classification underlines the usage profile of each conversion method, providing a clear description of the advantages and drawbacks belonging to each method. Hence, this paper focuses on two main processes, which are ontology enrichment and ontology population using XML data. Ontology enrichment is related to the schema of the ontology (TBox), and ontology population is related to an individual (Abox). In addition, the ontologies described in these methods are based on formal languages of the Semantic Web such as OWL (Ontology Web Language) or RDF (Resource Description Framework). These languages are formal because the semantics are formally defined and take advantage of the Description Logics. In contrast, XML data sources are without formal semantics. The XML language is used to store, export and share data between processes able to process the specific data structure. However, even if the semantics is not explicitly expressed, data structure contains the universe of discourse by using a qualified vocabulary regarding a consensual agreement. In order to formalize this semantics, the OWL language provides rich logical constraints. Therefore, these logical constraints are evolved in the transformation of XML documents into OWL documents, allowing the enrichment and the population of the target ontology. To design such a transformation, the current research field establishes connections between OWL constructs (classes, predicates, simple or complex data types, etc.) and XML constructs (elements, attributes, element lists, etc.). Two different approaches for the transformation process are exposed. The instance approaches are based on XML documents without any schema associated. The validation approaches are based on the XML schema and document validated by the associated schema. The second approaches benefit from the schema definition to provide automated transformations with logic constraints. Both approaches are discussed in the text.

https://hal.archives-ouvertes.fr/hal-01253351