Search results for " similarity"

showing 10 items of 126 documents

Syntagmatic and Paradigmatic Associations in Information Retrieval

2003

It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated …

Text corpusEmpirical dataSyntagmatic analysisInformation retrievalWeb search querySemantic similarityComputer scienceStatistical modelIndependent component analysisAssociative property

researchProduct

Graph-based exploration and clustering analysis of semantic spaces

2019

Abstract The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that …

Text corpusSemantic spacesComputer Networks and CommunicationsComputer sciencegraph theory0211 other engineering and technologiesWordNetNetwork science02 engineering and technologysemanttinen webSemantic networkword2vec similarity networksWord2vec similarity networksClique relaxationscohesive clusters0202 electrical engineering electronic engineering information engineeringWord2vecCluster analysisThesaurus (information retrieval)021103 operations researchMultidisciplinaryInformation retrievalverkkoteorialcsh:T57-57.97Graph theorycliquesGraph theoryclique relaxationsComputational MathematicsCliqueslcsh:Applied mathematics. Quantitative methodssemantic spaces020201 artificial intelligence & image processingCohesive clusters

researchProduct

Movie Script Similarity Using Multilayer Network Portrait Divergence

2020

International audience; This paper addresses the question of movie similarity through multilayer graph similarity measures. Recent work has shown how to construct multilayer networks using movie scripts, and how they capture different aspects of the stories. Based on this modeling, we propose to rely on the multilayer structure and compute different similarities, so we may compare movies, not from their visual content, summary, or actors, but actually from their own storyboard. We propose to do so using “portrait divergence”, which has been recently introduced to compute graph distances from summarizing graph characteristics. We illustrate our approach on the series of six Star Wars movies.

Theoretical computer scienceComputer science02 engineering and technologyStar (graph theory)[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]computer.software_genre01 natural sciences010305 fluids & plasmas[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Similarity (network science)[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]0103 physical sciences0202 electrical engineering electronic engineering information engineering[INFO]Computer Science [cs]StoryboardDivergence (statistics)Structure (mathematical logic)Network portraitMoviesMultilayer networksNetwork similarity[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]Construct (python library)Scripting languageGraph (abstract data type)020201 artificial intelligence & image processingcomputer[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing

researchProduct

Learning Similarity Scores by Using a Family of Distance Functions in Multiple Feature Spaces

2017

There exist a large number of distance functions that allow one to measure similarity between feature vectors and thus can be used for ranking purposes. When multiple representations of the same object are available, distances in each representation space may be combined to produce a single similarity score. In this paper, we present a method to build such a similarity ranking out of a family of distance functions. Unlike other approaches that aim to select the best distance function for a particular context, we use several distances and combine them in a convenient way. To this end, we adopt a classical similarity learning approach and face the problem as a standard supervised machine lea…

Training setbusiness.industryFeature vectorSimilarity heuristicPattern recognition02 engineering and technologyMachine learningcomputer.software_genreSemantic similarityArtificial Intelligence020204 information systemsNormalized compression distance0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionArtificial intelligenceJaro–Winkler distancebusinesscomputerClassifier (UML)SoftwareSimilarity learningMathematicsInternational Journal of Pattern Recognition and Artificial Intelligence

researchProduct

Improving structural similarity based virtual screening using background knowledge

2013

Background Virtual screening in the form of similarity rankings is often applied in the early drug discovery process to rank and prioritize compounds from a database. This similarity ranking can be achieved with structural similarity measures. However, their general nature can lead to insufficient performance in some application cases. In this paper, we provide a link between ranking-based virtual screening and fragment-based data mining methods. The inclusion of binding-relevant background knowledge into a structural similarity measure improves the quality of the similarity rankings. This background knowledge in the form of binding relevant substructures can either be derived by hand selec…

Virtual screeningEnrichmentPhysical and Theoretical ChemistryLibrary and Information SciencesStructural similarity004 InformatikComputer Graphics and Computer-Aided DesignData miningBackground knowledge004 Data processingComputer Science ApplicationsResearch Article

researchProduct

When WORDS with Higher-frequency Neighbours Become Words with No Higher-frequency Neighbour (Or How to Undress the Neighbourhood Frequency Effect)

2000

Abstract “SATOR AREPO TENET OPERA ROTAS” (The ploughman, with his plough, manages the work) The influence of lexical similarity on word recognition has been discussed not only because of its theoretical impact but also because it is difficult to replicate. Among the multiplicity of the causes of this inconsistency one reason can be that different words were used in comparing words with higher-frequency neighbours (HFN) and words without HFN. In this experiment we chose French words for which the neighbourhood changes when they are written in UPPER case or in lower case. For example ‘DEFI’ has one HFN (‘DEMI’) but when it is displayed in lower case ‘defi’ has no HFN because ‘demi’ has no acc…

Visual word recognitionSatorbiologyWord recognitionLexical similarityLexical decision taskFrequency effectArithmeticPsychologybiology.organism_classificationLinguistics

researchProduct

Algebraic Properties to Optimize kNN Queries

2011

International audience; New applications that are being required to employ Database Management Systems (DBMSs), such as storing and retrieving complex data (images, sound, temporal series, genetic data, etc.) and analytical data processing (data mining, social networks analysis, etc.), increasingly impose the need for new ways of expressing predicates. Among the new most studied predicates are the similarity-based ones, where the two commonest are the similarity range and the k-nearest neighbor predicates. The k-nearest neighbor predicate is surely the most interesting for several applications, including Content-Based Image Retrieval (CBIR) and Data Mining (DM) tasks, yet it is also the mos…

[ INFO.INFO-DB ] Computer Science [cs]/Databases [cs.DB][INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]similarity algebra[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB]algebraic propertiesunary similarity queriesquery optimization

researchProduct

Toward Approximate GML Retrieval Based on Structural and Semantic Characteristics

2010

International audience; GML is emerging as the new standard for representing geographic information in GISs on the Web, allowing the encoding of structurally and semantically rich geographic data in self describing XML-based geographic entities. In this study, we address the problem of approximate querying and ranked results for GML data and provide a method for GML query evaluation. Our method consists of two main contributions. First, we propose a tree model for representing GML queries and data collections. Then, we introduce a GML retrieval method based on the concept of tree edit distance as an efficient means for comparing semi-structured data. Our approach allows the evaluation of bo…

researchProduct

Qualifying semantic graphs using model checking

2011

International audience; Semantic interoperability problems have found their solutions using languages and techniques from the Semantic Web. The proliferation of ontologies and meta-information has improved the understanding of information and the relevance of search engine responses. However, the construction of semantic graphs is a source of numerous errors of interpretation or modeling and scalability remains a major problem. The processing of large semantic graphs is a limit to the use of semantics in current information systems. The work presented in this paper is part of a new research at the border of two areas: the semantic web and the model checking. This line of research concerns t…

[ INFO.INFO-MO ] Computer Science [cs]/Modeling and Simulation[INFO.INFO-WB] Computer Science [cs]/WebComputer science[ INFO.INFO-WB ] Computer Science [cs]/Web0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesSocial Semantic Webtemporal logicSemantic similaritySemantic computing0202 electrical engineering electronic engineering information engineeringSemantic analyticsSemantic integrationSemantic Web StackInformation retrievalbusiness.industry[INFO.INFO-WB]Computer Science [cs]/WebSemantic search020207 software engineeringSemantic interoperability[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationModel-checking010201 computation theory & mathematicsSemantic graphTheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS[INFO.INFO-MO] Computer Science [cs]/Modeling and SimulationArtificial intelligencebusinesscomputerNatural language processing2011 International Conference on Innovations in Information Technology

researchProduct

A new approach based on NμSMV Model to query semantic graph

2011

International audience; The language most frequently used to represent the semantic graphs is the RDF (W3C standard for meta-modeling). The construction of semantic graphs is a source of numerous errors of interpretation. Processing of large semantic graphs can be a limit to use semantics in modern information systems. The work presented in this paper is part of a new research at the border between two areas: the semantic web and the model checking. For this, we developed a tool, RDF2NμSMV, which converts RDF graphs into NμSMV language. This conversion aims checking the semantic graphs with the model checker NμSMV in order to verify the consistency of the data. The data integration and shar…

[ INFO.INFO-MO ] Computer Science [cs]/Modeling and Simulation[INFO.INFO-WB] Computer Science [cs]/WebComputer science[ INFO.INFO-WB ] Computer Science [cs]/WebNμSMVTemporal logic02 engineering and technologycomputer.software_genreQuery languageSPARQLtemporal logic queryRDFModel CheckingSemantic similarity020204 information systemsSemantic computing0202 electrical engineering electronic engineering information engineeringSPARQLRDFSemantic WebGraph databaseInformation retrieval[INFO.INFO-WB]Computer Science [cs]/Webcomputer.file_format[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationAbstract semantic graphSemantic graphQuery checking020201 artificial intelligence & image processing[INFO.INFO-MO] Computer Science [cs]/Modeling and Simulationcomputer

researchProduct