6533b830fe1ef96bd1296fe5

RESEARCH PRODUCT

Syntagmatic and Paradigmatic Associations in Information Retrieval

Reinhard Rapp

subject

Text corpusEmpirical dataSyntagmatic analysisInformation retrievalWeb search querySemantic similarityComputer scienceStatistical modelIndependent component analysisAssociative property

description

It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated on empirical data. It turns out that for both tasks the performance in the simulation is comparable to the performance of human subjects.

https://doi.org/10.1007/978-3-642-18991-3_54