0000000001055391
AUTHOR
Michael Cochez
Biased GraphWalks for RDF Graph Embeddings
Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…
Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…
Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…
Global RDF Vector Space Embeddings
Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec.
Knowledge Representation on the Web revisited: Tools for Prototype Based Ontologies
In recent years RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we present a practical implementation of a different kind of knowledge representation based on Prototypes. In detail, we present a concrete syntax easily and effectively parsable by applications. We also present extensible implementations of a prototype knowledge base, specifically designed for storage of Prototypes. These implementations are written in Java and can be extended by using the implementation as a library. Alternatively, the software can be deployed as such. Further, results of benchmarks for both local and web d…
Biased graph walks for RDF graph embeddings
Knowledge Graphs have been recognized as a valuable source for background information in many data mining, information retrieval, natural language processing, and knowledge extraction tasks. However, obtaining a suitable feature vector representation from RDF graphs is a challenging task. In this paper, we extend the RDF2Vec approach, which leverages language modeling techniques for unsupervised feature extraction from sequences of entities. We generate sequences by exploiting local information from graph substructures, harvested by graph walks, and learn latent numerical representations of entities in RDF graphs. We extend the way we compute feature vector representations by comparing twel…
Evolutionary cloud for cooperative UAV coordination
Agile Deep Learning UAVs Operating in Smart Spaces: Collective Intelligence Versus “Mission-Impossible”
The environments, in which we all live, are known to be complex and unpredictable. The complete discovery of these environments aiming to take full control over them is a “mission-impossible”, however, still in our common agenda. People intend to make their living spaces smarter utilizing innovations from the Internet of Things and Artificial Intelligence. Unmanned aerial vehicles (UAVs) as very dynamic, autonomous and intelligent things capable to discover and control large areas are becoming important “inhabitants” within existing and future smart cities. Our concern in this paper is to challenge the potential of UAVs in situations, which are evolving fast in a way unseen before, e.g., em…
TB-Structure : Collective Intelligence for Exploratory Keyword Search
In this paper we address an exploratory search challenge by presenting a new (structure-driven) collaborative filtering technique. The aim is to increase search effectiveness by predicting implicit seeker’s intents at an early stage of the search process. This is achieved by uncovering behavioral patterns within large datasets of preserved collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in the form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s i…
A First Experiment on Including Text Literals in KGloVe
Graph embedding models produce embedding vectors for entities and relations in Knowledge Graphs, often without taking literal properties into account. We show an initial idea based on the combination of global graph structure with additional information provided by textual information in properties. Our initial experiment shows that this approach might be useful, but does not clearly outperform earlier approaches when evaluated on machine learning tasks.
How Do Computer Science Students Use Distributed Version Control Systems?
The inclusion of version control systems into computing curricula enables educators to promote competences needed in real-life situations. The use of a version control system also has several potential benefits for the teacher. The teacher might, for instance, use the tool to monitor students’ progress and to give feedback efficiently. This study analyzes how students used the distributed version control system Git in several computing courses. We analyzed students’ commit log data in two advanced programming courses, a second-year introductory software engineering course, and two courses where students developed software products. This enables us to compare Git usage between introductory l…
Anomaly Detection Algorithms for the Sleeping Cell Detection in LTE Networks
The Sleeping Cell problem is a particular type of cell degradation in Long-Term Evolution (LTE) networks. In practice such cell outage leads to the lack of network service and sometimes it can be revealed only after multiple user complains by an operator. In this study a cell becomes sleeping because of a Random Access Channel (RACH) failure, which may happen due to software or hardware problems. For the detection of malfunctioning cells, we introduce a data mining based framework. In its core is the analysis of event sequences reported by a User Equipment (UE) to a serving Base Station (BS). The crucial element of the developed framework is an anomaly detection algorithm. We compare perfor…
Balanced Large Scale Knowledge Matching Using LSH Forest
Evolving Knowledge Ecosystems were proposed recently to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investig…
Challenges and Confusions in Learning Version Control with Git
Scholars agree on the importance of incorporating use of version control systems (VCSs) into computing curricula, so as to be able to prepare students for today’s distributed and collaborative work places. One of the present-day distributed version control systems (DVCSs) is Git, the system we have used on several courses. In this paper, we report on the challenges for learning and using the system based on a survey data collected from a project-based course and our own teaching experiences during several different kinds of computing courses. The results of this analysis are discussed and recommendations are made. peerReviewed
Locality-Sensitive Hashing for Massive String-Based Ontology Matching
This paper reports initial research results related to the use of locality-sensitive hashing (LSH) for string-based matching of big ontologies. Two ways of transforming the matching problem into a LSH problem are proposed and experimental results are reported. The performed experiments show that using LSH for ontology matching could lead to a very fast matching process. The quality of the alignment achieved in these experiments is comparable to state-of-the-art matchers, but much faster. Further research is needed to find out whether the use of different metrics or specific hardware would improve the results. peerReviewed
Knowledge Representation on the Web Revisited: The Case for Prototypes
Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing and reuse on the Web. Prototypes avoid the distinction between classes and instances and provide a means for object-based data sharing and reuse.
Leveraging Knowledge Graph Embedding Techniques for Industry 4.0 Use Cases
Industry is evolving towards Industry 4.0, which holds the promise of increased flexibility in manufacturing, better quality and improved productivity. A core actor of this growth is using sensors, which must capture data that can used in unforeseen ways to achieve a performance not achievable without them. However, the complexity of this improved setting is much greater than what is currently used in practice. Hence, it is imperative that the management cannot only be performed by human labor force, but part of that will be done by automated algorithms instead. A natural way to represent the data generated by this large amount of sensors, which are not acting measuring independent variable…
Global RDF Vector Space Embeddings
Vector space embeddings have been shown to perform well when using RDF data in data mining and machine learning tasks. Existing approaches, such as RDF2Vec, use local information, i.e., they rely on local sequences generated for nodes in the RDF graph. For word embeddings, global techniques, such as GloVe, have been proposed as an alternative. In this paper, we show how the idea of global embeddings can be transferred to RDF embeddings, and show that the results are competitive with traditional local techniques like RDF2Vec. peerReviewed
Agile Deep Learning UAVs Operating in Smart Spaces : Collective Intelligence Versus “Mission-Impossible”
The environments, in which we all live, are known to be complex and unpredictable. The complete discovery of these environments aiming to take full control over them is a “mission-impossible”, however, still in our common agenda. People intend to make their living spaces smarter utilizing innovations from the Internet of Things and Artificial Intelligence. Unmanned aerial vehicles (UAVs) as very dynamic, autonomous and intelligent things capable to discover and control large areas are becoming important “inhabitants” within existing and future smart cities. Our concern in this paper is to challenge the potential of UAVs in situations, which are evolving fast in a way unseen before, e.g., em…
Knowledge Representation on the Web Revisited : The Case for Prototypes
Recently, RDF and OWL have become the most common knowledge representation languages in use on the Web, propelled by the recommendation of the W3C. In this paper we examine an alternative way to represent knowledge based on Prototypes. This Prototype-based representation has different properties, which we argue to be more suitable for data sharing and reuse on the Web. Prototypes avoid the distinction between classes and instances and provide a means for object-based data sharing and reuse. In this paper we discuss the requirements and design principles for Knowledge Representation based on Prototypes on the Web, after which we propose a formal syntax and semantics. We further show how to e…
Use of a Semantic Language to Reduce the Indeterminacy in Agents Communication
In the field of agent communications uncertainty and vagueness in the message content and in the achievable results play a primordial role when two agents (human or artificial) communicate. Even though the importance of vagueness and uncertainty has been recognized long ago, only recently mechanisms related to the communications’ semantics that allow a practical approach have been designed; more specifically, the development of tools such as agent programming languages and frameworks, which is a field of intensive research. On the other hand, recent theoretical ideas, drawn from situation semantics theory and the works of Sutton on semantic information, support this work. This paper applies…
Linked Data in Enterprise Integration
Twister Tries
Many commonly used data-mining techniques utilized across research fields perform poorly when used for large data sets. Sequential agglomerative hierarchical non-overlapping clustering is one technique for which the algorithms’ scaling properties prohibit clustering of a large amount of items. Besides the unfavorable time complexity of O(n 2 ), these algorithms have a space complexity of O(n 2 ), which can be reduced to O(n) if the time complexity is allowed to rise to O(n 2 log2 n). In this paper, we propose the use of locality-sensitive hashing combined with a novel data structure called twister tries to provide an approximate clustering for average linkage. Our approach requires only lin…
Scalable Hierarchical Clustering: Twister Tries with a Posteriori Trie Elimination
Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome of a hierarchical clustering) which resembles the one produced by AHC, while only needing linear space and time. However, twister tries are sensitive to rare, but still possible, hash evaluations. These might have a disastrous effect on the final outcome. We propose the use of a metaheuristic algor…
Taming big knowledge evolution
Information and its derived knowledge are not static. Instead, information is changing over time and our understanding of it evolves with our ability and willingness to consume the information. When compared to humans, current computer systems seem very limited in their ability to really understand the meaning of things. On the other hand, they are very powerful when it comes down to performing exact computations. One aspect which sets humans apart from machines when trying to understand the world is that we will often make mistakes, forget information, or choose what to focus on. To put this in another perspective, it seems like humans can behave somehow more randomly and still outperform …
TB-Structure: Collective Intelligence for Exploratory Keyword Search
In this paper we address an exploratory search challenge by presenting a new (structure-driven) collaborative filtering technique. The aim is to increase search effectiveness by predicting implicit seeker’s intents at an early stage of the search process. This is achieved by uncovering behavioral patterns within large datasets of preserved collective search experience. We apply a specific tree-based data structure called a TB (There-and-Back) structure for compact storage of search history in the form of merged query trails – sequences of queries approaching iteratively a seeker’s goal. The organization of TB-structures allows inferring new implicit trails for the prediction of a seeker’s i…
Leveraging Knowledge Graph Embedding Techniques for Industry 4.0 Use Cases
Industry is evolving towards Industry 4.0, which holds the promise of increased exibility in manufacturing, better quality and improved productivity. A core actor of this growth is using sensors, which must capture data that can used in unforeseen ways to achieve a performance not achievable without them. However, the complexity of this improved setting is much greater than what is currently used in practice. Hence, it is imperative that the management cannot only be performed by human labor force, but part of that will be done by automated algorithms instead. A natural way to represent the data generated by this large amount of sensors, which are not acting measuring independent variables,…
Mining Maximal Frequent Patterns in Transactional Databases and Dynamic Data Streams: A Spark-based Approach
Mining maximal frequent patterns (MFPs) in transactional databases (TDBs) and dynamic data streams (DDSs) is substantially important for business intelligence. MFPs, as the smallest set of patterns, help to reveal customers’ purchase rules and market basket analysis (MBA). Although, numerous studies have been carried out in this area, most of them extend the main-memory based Apriori or FP-growth algorithms. Therefore, these approaches are not only unscalable but also lack parallelism. Consequently, ever increasing big data sources requirements cannot be met. In addition, mining performance in some existing approaches degrade drastically due to the presence of null transactions. We, therefo…
Issues with a course that emphasizes self-direction
In this paper, we examine a master's level course that emphasizes self-direction on the part of students. The course is run by weekly group assignments and requires independent work such that only one mandatory classroom session is arranged each week. Our specific research interests are how students responded to the setting of this kind and whether they demonstrated self-direction during the course. We surveyed the students' view of the course, their group work experience, and their study habits, and analyzed the resultant survey data for themes. The results suggest that while the pass rate was considerably high and the course was regarded as well-organized by the students, there were sever…
Indeterminacy Reduction in Agent Communication Using a Semantic Language
In recent years, the importance of vagueness and uncertainty in the messages exchanged between agents has been highlighted, mainly due to the ubiquitous nature of the (artificial or human) agents’ communication. The imprecision in the communication becomes more significant when the autonomy of the agents increases or the number of exchanged messages for a communicative goal is limited. In this paper we conjugate ideas drawn from situation semantics theory, human communication, and the multi-agent systems (MAS) field to reduce the impact of vagueness and uncertainty present in the communication. The main advances are achieved with the help of context information, collaboration and reinforcem…