6533b7dbfe1ef96bd1271423
RESEARCH PRODUCT
Ontology-based Integration of Web Navigation for Dynamic User Profiling
Anett HoppeAna RoxinChristophe Nicollesubject
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]lcsh:Computer engineering. Computer hardware[ INFO ] Computer Science [cs]Knowledge representation and reasoningComputer scienceSemantic Web Ontologies SWRL Big Data reasoningBig datalcsh:TK7885-789502 engineering and technologyOntology (information science)[INFO] Computer Science [cs][INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Big Data reasoningWorld Wide WebKnowledge extraction020204 information systems0202 electrical engineering electronic engineering information engineeringOntologiesWeb navigation[INFO]Computer Science [cs][ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]Semantic WebSWRLSemantic WebUser profilebusiness.industrylcsh:Zlcsh:Bibliography. Library science. Information resourcesSemantic technology020201 artificial intelligence & image processingbusinessdescription
The development of technology for handling information on a Big Data-scale is a buzzing topic of current research. Indeed, improved techniques for knowledge discovery are crucial for scientific and economic exploitation of large-scale raw data. In research collaboration with an industrial actor, we explore the applicability of ontology-based knowledge extraction and representation for today's biggest source of large-scale data, the Web. The goal is to develop a profiling application, based on the implicit information that every user leaves while navigating the online, with the goal to identify and model preferences and interests in a detailed user profile. This includes the identification of current tendencies as well as the prediction of possible future interests, as far as they are deducible from the collected browsing information, and integrated expert domain knowledge. The article at hand gives an overview on the current state of the research, the developments made and insights gained.Keywords: Semantic Web, Ontologies, SWRL, Big Data reasoning1 Introduction"Big Data" is one of the big buzzwords of our time - culminating in the creation of various congresses and conferences focusing on only that topic during the recent years (e.g. IEEE Congress on Big Data, starting from 2011). The handling of immense amounts of data brings scientists and analysts in a dilemma: On the one hand, using sophisticated analysis techniques might bring best results, but usually come with a higher processing complexity and time that is just not tolerable for most applications. On the other hand, methods known for their efficiency may fail to exploit the data sources in all their depth. Several research works proposed distinct criteria to define the nature of "Big Data" (e.g. [1]).The definition largely converges towards the following five:* volume: massive amounts of data have to be treated,* velocity: those data arrive in high speed,* variety: data types and formats are heterogeneous,* veracity: data are not always sound and have to be verified,* value: they have an inherent value that has to be discovered by the application.Applications acting in a Big Data context have to handle all of them in an efficient manner, balancing analysis depth and performance time.For that very reason, the application of semantic technology is often discarded for a Big Data context. Semantic analysis seems too complex, too costly to be affordable in an environment in which often already very efficient techniques do not come up to the performance necessities. We want to make a case for ontology-based knowledge representation, even when handling vast data amounts. By employing an ontology that has been customised for the application domain to the very detail, the information is limited to those bits and bytes that are actually relevant. Furthermore, we make an effort to avoid performance issues, by decoupling costly analysis steps from the actual, realtime user profiling process (please refer to Section 0 for details).Furthermore, costly analysis steps have been decoupled from the final system purpose to avoid performance issues.We demonstrate this approach based on an application in digital advertising. Publishers nowadays have detailed information about their user's navigation behaviour: servers capture not only the web pages that were requested by a certain ID, but also the respective time stamps, device information etc. These elements allow insight in usage patterns, but also a deduction of the various contexts, a user might be active in (a distinction between the working environment and private surfing, for example). In the development of our system, we explore the integration of semantic technology to the process, with a close eye on keeping the system in the range of satisfactory performance.2 Related WorkTraditionally, profiling approaches (following the methodologies applied in document indexing) use a keyword-based representation to summarise source documents and user interests in an economical way. …
year | journal | country | edition | language |
---|---|---|---|---|
2015-03-30 |