Graph-based exploration and clustering analysis of semantic spaces

6533b82efe1ef96bd1292a7f

RESEARCH PRODUCT

Graph-based exploration and clustering analysis of semantic spaces

Vladimir Boginski Alexander Veremyev Alexander Semenov Eduardo L. Pasiliao

subject

Text corpus Semantic spaces Computer Networks and Communications Computer science graph theory 0211 other engineering and technologies WordNet Network science 02 engineering and technology semanttinen web Semantic network word2vec similarity networks Word2vec similarity networks Clique relaxations cohesive clusters 0202 electrical engineering electronic engineering information engineering Word2vec Cluster analysis Thesaurus (information retrieval)021103 operations research Multidisciplinary Information retrieval verkkoteoria lcsh:T57-57.97 Graph theory cliques Graph theory clique relaxations Computational Mathematics Cliques lcsh:Applied mathematics. Quantitative methods semantic spaces 020201 artificial intelligence & image processing Cohesive clusters

description

Abstract The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that human built networks possess more intuitive global connectivity patterns, whereas local characteristics (in particular, dense clusters) of the machine built networks provide much richer information on the contextual usage and perceived meanings of words, which reveals interesting structural differences between human built and machine built semantic networks. To our knowledge, this is the first study that uses graph theory and network science in the considered context; therefore, we also provide interesting examples and discuss potential research directions that may motivate further research on the synthesis of lexicographic and machine learning based tools and lead to new insights in this area.

year	journal	country	edition	language
2019-11-13

http://urn.fi/URN:NBN:fi:jyu-202002041965