Search results for " similarity"
showing 10 items of 126 documents
Syntagmatic and Paradigmatic Associations in Information Retrieval
2003
It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated …
Graph-based exploration and clustering analysis of semantic spaces
2019
Abstract The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that …
Movie Script Similarity Using Multilayer Network Portrait Divergence
2020
International audience; This paper addresses the question of movie similarity through multilayer graph similarity measures. Recent work has shown how to construct multilayer networks using movie scripts, and how they capture different aspects of the stories. Based on this modeling, we propose to rely on the multilayer structure and compute different similarities, so we may compare movies, not from their visual content, summary, or actors, but actually from their own storyboard. We propose to do so using “portrait divergence”, which has been recently introduced to compute graph distances from summarizing graph characteristics. We illustrate our approach on the series of six Star Wars movies.
Learning Similarity Scores by Using a Family of Distance Functions in Multiple Feature Spaces
2017
There exist a large number of distance functions that allow one to measure similarity between feature vectors and thus can be used for ranking purposes. When multiple representations of the same object are available, distances in each representation space may be combined to produce a single similarity score. In this paper, we present a method to build such a similarity ranking out of a family of distance functions. Unlike other approaches that aim to select the best distance function for a particular context, we use several distances and combine them in a convenient way. To this end, we adopt a classical similarity learning approach and face the problem as a standard supervised machine lea…
Improving structural similarity based virtual screening using background knowledge
2013
Background Virtual screening in the form of similarity rankings is often applied in the early drug discovery process to rank and prioritize compounds from a database. This similarity ranking can be achieved with structural similarity measures. However, their general nature can lead to insufficient performance in some application cases. In this paper, we provide a link between ranking-based virtual screening and fragment-based data mining methods. The inclusion of binding-relevant background knowledge into a structural similarity measure improves the quality of the similarity rankings. This background knowledge in the form of binding relevant substructures can either be derived by hand selec…
When WORDS with Higher-frequency Neighbours Become Words with No Higher-frequency Neighbour (Or How to Undress the Neighbourhood Frequency Effect)
2000
Abstract “SATOR AREPO TENET OPERA ROTAS” (The ploughman, with his plough, manages the work) The influence of lexical similarity on word recognition has been discussed not only because of its theoretical impact but also because it is difficult to replicate. Among the multiplicity of the causes of this inconsistency one reason can be that different words were used in comparing words with higher-frequency neighbours (HFN) and words without HFN. In this experiment we chose French words for which the neighbourhood changes when they are written in UPPER case or in lower case. For example ‘DEFI’ has one HFN (‘DEMI’) but when it is displayed in lower case ‘defi’ has no HFN because ‘demi’ has no acc…
Algebraic Properties to Optimize kNN Queries
2011
International audience; New applications that are being required to employ Database Management Systems (DBMSs), such as storing and retrieving complex data (images, sound, temporal series, genetic data, etc.) and analytical data processing (data mining, social networks analysis, etc.), increasingly impose the need for new ways of expressing predicates. Among the new most studied predicates are the similarity-based ones, where the two commonest are the similarity range and the k-nearest neighbor predicates. The k-nearest neighbor predicate is surely the most interesting for several applications, including Content-Based Image Retrieval (CBIR) and Data Mining (DM) tasks, yet it is also the mos…
Toward Approximate GML Retrieval Based on Structural and Semantic Characteristics
2010
International audience; GML is emerging as the new standard for representing geographic information in GISs on the Web, allowing the encoding of structurally and semantically rich geographic data in self describing XML-based geographic entities. In this study, we address the problem of approximate querying and ranked results for GML data and provide a method for GML query evaluation. Our method consists of two main contributions. First, we propose a tree model for representing GML queries and data collections. Then, we introduce a GML retrieval method based on the concept of tree edit distance as an efficient means for comparing semi-structured data. Our approach allows the evaluation of bo…
Qualifying semantic graphs using model checking
2011
International audience; Semantic interoperability problems have found their solutions using languages and techniques from the Semantic Web. The proliferation of ontologies and meta-information has improved the understanding of information and the relevance of search engine responses. However, the construction of semantic graphs is a source of numerous errors of interpretation or modeling and scalability remains a major problem. The processing of large semantic graphs is a limit to the use of semantics in current information systems. The work presented in this paper is part of a new research at the border of two areas: the semantic web and the model checking. This line of research concerns t…
A new approach based on NμSMV Model to query semantic graph
2011
International audience; The language most frequently used to represent the semantic graphs is the RDF (W3C standard for meta-modeling). The construction of semantic graphs is a source of numerous errors of interpretation. Processing of large semantic graphs can be a limit to use semantics in modern information systems. The work presented in this paper is part of a new research at the border between two areas: the semantic web and the model checking. For this, we developed a tool, RDF2NμSMV, which converts RDF graphs into NμSMV language. This conversion aims checking the semantic graphs with the model checker NμSMV in order to verify the consistency of the data. The data integration and shar…