0000000000795357

AUTHOR

Michael Cochez

Biased GraphWalks for RDF Graph Embeddings

Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…

research product

Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest

Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…

research product

Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest

Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…

research product

Global RDF Vector Space Embeddings

Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec.

research product

Knowledge Representation on the Web revisited: Tools for Prototype Based Ontologies

In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by applications. We also present extensible implementations of a prototype knowledge base, specifically designed for storage of Prototypes. These implementations are written in Java and can be extended by using the implementation as a library. Alternatively, the software can be deployed as such. Further, results of benchmarks for both local and web d…

research product

Biased graph walks for RDF graph embeddings

Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…

research product

Evolutionary cloud for cooperative UAV coordination

research product

Agile Deep Learning UAVs Operating in Smart Spaces: Collective Intelligence Versus “Mission-Impossible”

The environments, in which we all live, are known to be complex and unpredictable. The complete discovery of these environments aiming to take full control over them is a “mission-impossible”, however, still in our common agenda. People intend to make their living spaces smarter utilizing innovations from the Internet of Things and Artificial Intelligence. Unmanned aerial vehicles (UAVs) as very dynamic, autonomous and intelligent things capable to discover and control large areas are becoming important “inhabitants” within existing and future smart cities. Our concern in this paper is to challenge the potential of UAVs in situations, which are evolving fast in a way unseen before, e.g., em…

research product

TB-Structure : Collective Intelligence for Exploratory Keyword Search

In this paper we address an exploratory search challenge by presenting a new (structure-driven) collaborative filtering technique. The aim is to increase search effectiveness by predicting implicit seeker’s intents at an early stage of the search process. This is achieved by uncovering behavioral patterns within large datasets of preserved collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in the form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s i…

research product

A First Experiment on Including Text Literals in KGloVe

Graph embedding models produce embedding vectors for entities and relations in Knowledge Graphs, often without taking literal properties into account. We show an initial idea based on the combination of global graph structure with additional information provided by textual information in properties. Our initial experiment shows that this approach might be useful, but does not clearly outperform earlier approaches when evaluated on machine learning tasks.

research product

How Do Computer Science Students Use Distributed Version Control Systems?

The inclusion of version control systems into computing curricula enables educators to promote competences needed in real-life situations. The use of a version control system also has several potential benefits for the teacher. The teacher might, for instance, use the tool to monitor students’ progress and to give feedback efficiently. This study analyzes how students used the distributed version control system Git in several computing courses. We analyzed students’ commit log data in two advanced programming courses, a second-year introductory software engineering course, and two courses where students developed software products. This enables us to compare Git usage between introductory l…

research product

Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks

The Sleeping Cell problem is a particular type of cell degradation in Long-Term Evolution (LTE) networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. In this study a cell becomes sleeping because of a Random Access Channel (RACH) failure, which may happen due to software or hardware problems. For the detection of malfunctioning cells, we introduce a data mining based framework. In its core is the analysis of event sequences reported by a User Equipment (UE) to a serving Base Station (BS). The crucial element of the developed framework is an anomaly detection algorithm. We compare perfor…

research product

Balanced Large Scale Knowledge Matching Using LSH Forest

Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investig…

research product

Challenges and Confusions in Learning Version Control with Git

Scholars agree on the importance of incorporating use of version control systems (VCSs) into computing curricula, so as to be able to prepare students for today’s distributed and collaborative work places. One of the present-day distributed version control systems (DVCSs) is Git, the system we have used on several courses. In this paper, we report on the challenges for learning and using the system based on a survey data collected from a project-based course and our own teaching experiences during several different kinds of computing courses. The results of this analysis are discussed and recommendations are made. peerReviewed

research product

Locality-Sensitive Hashing for Massive String-Based Ontology Matching

This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results. peerReviewed

research product

Knowledge Representation on the Web Revisited: The Case for Prototypes

Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing and reuse on the Web. Prototypes avoid the distinction between classes and instances and provide a means for object-based data sharing and reuse.

research product

Leveraging Knowledge Graph Embedding Techniques for Industry 4.0 Use Cases

Industry is evolving towards Industry 4.0, which holds the promise of increased flexibility in manufacturing, better quality and improved productivity. A core actor of this growth is using sensors, which must capture data that can used in unforeseen ways to achieve a performance not achievable without them. However, the complexity of this improved setting is much greater than what is currently used in practice. Hence, it is imperative that the management cannot only be performed by human labor force, but part of that will be done by automated algorithms instead. A natural way to represent the data generated by this large amount of sensors, which are not acting measuring independent variable…

research product

Global RDF Vector Space Embeddings

Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec. peerReviewed

research product

Agile Deep Learning UAVs Operating in Smart Spaces : Collective Intelligence Versus “Mission-Impossible”

The environments, in which we all live, are known to be complex and unpredictable. The complete discovery of these environments aiming to take full control over them is a “mission-impossible”, however, still in our common agenda. People intend to make their living spaces smarter utilizing innovations from the Internet of Things and Artificial Intelligence. Unmanned aerial vehicles (UAVs) as very dynamic, autonomous and intelligent things capable to discover and control large areas are becoming important “inhabitants” within existing and future smart cities. Our concern in this paper is to challenge the potential of UAVs in situations, which are evolving fast in a way unseen before, e.g., em…

research product

Knowledge Representation on the Web Revisited : The Case for Prototypes

Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing and reuse on the Web. Prototypes avoid the distinction between classes and instances and provide a means for object-based data sharing and reuse. In this paper we discuss the requirements and design principles for Knowledge Representation based on Prototypes on the Web, after which we propose a formal syntax and semantics. We further show how to e…

research product

Use of a Semantic Language to Reduce the Indeterminacy in Agents Communication

In the field of agent communications uncertainty and vagueness in the message content and in the achievable results play a primordial role when two agents (human or artificial) communicate. Even though the importance of vagueness and uncertainty has been recognized long ago, only recently mechanisms related to the communications’ semantics that allow a practical approach have been designed; more specifically, the development of tools such as agent programming languages and frameworks, which is a field of intensive research. On the other hand, recent theoretical ideas, drawn from situation semantics theory and the works of Sutton on semantic information, support this work. This paper applies…

research product

Linked Data in Enterprise Integration

research product

Twister Tries

Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…

research product

Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination

Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algor…

research product

Taming big knowledge evolution

Information and its derived knowledge are not static. Instead, information is changing over time and our understanding of it evolves with our ability and willingness to consume the information. When compared to humans, current computer systems seem very limited in their ability to really understand the meaning of things. On the other hand, they are very powerful when it comes down to performing exact computations. One aspect which sets humans apart from machines when trying to understand the world is that we will often make mistakes, forget information, or choose what to focus on. To put this in another perspective, it seems like humans can behave somehow more randomly and still outperform …

research product

TB-Structure: Collective Intelligence for Exploratory Keyword Search

In this paper we address an exploratory search challenge by presenting a new (structure-driven) collaborative filtering technique. The aim is to increase search effectiveness by predicting implicit seeker’s intents at an early stage of the search process. This is achieved by uncovering behavioral patterns within large datasets of preserved collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in the form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s i…

research product

Leveraging Knowledge Graph Embedding Techniques for Industry 4.0 Use Cases

Industry is evolving towards Industry 4.0, which holds the promise of increased exibility in manufacturing, better quality and improved productivity. A core actor of this growth is using sensors, which must capture data that can used in unforeseen ways to achieve a performance not achievable without them. However, the complexity of this improved setting is much greater than what is currently used in practice. Hence, it is imperative that the management cannot only be performed by human labor force, but part of that will be done by automated algorithms instead. A natural way to represent the data generated by this large amount of sensors, which are not acting measuring independent variables,…

research product

Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach

Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefo…

research product

Issues with a course that emphasizes self-direction

In this paper, we examine a master's level course that emphasizes self-direction on the part of students. The course is run by weekly group assignments and requires independent work such that only one mandatory classroom session is arranged each week. Our specific research interests are how students responded to the setting of this kind and whether they demonstrated self-direction during the course. We surveyed the students' view of the course, their group work experience, and their study habits, and analyzed the resultant survey data for themes. The results suggest that while the pass rate was considerably high and the course was regarded as well-organized by the students, there were sever…

research product

Indeterminacy Reduction in Agent Communication Using a Semantic Language

In recent years, the importance of vagueness and uncertainty in the messages exchanged between agents has been highlighted, mainly due to the ubiquitous nature of the (artificial or human) agents’ communication. The imprecision in the communication becomes more significant when the autonomy of the agents increases or the number of exchanged messages for a communicative goal is limited. In this paper we conjugate ideas drawn from situation semantics theory, human communication, and the multi-agent systems (MAS) field to reduce the impact of vagueness and uncertainty present in the communication. The main advances are achieved with the help of context information, collaboration and reinforcem…

research product