6533b85ffe1ef96bd12c1863

RESEARCH PRODUCT

Clustering of textual networks: analysing open-ended questions in text data of the perception of minerality in wine

Laurent GautierYves Le FurFrançois Bavaud

subject

Minéralité des vins[ SHS ] Humanities and Social Sciences[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics[SHS] Humanities and Social Sciences[SHS.LANGUE]Humanities and Social Sciences/Linguistics[SHS.LANGUE] Humanities and Social Sciences/Linguisticswine minerality[SHS]Humanities and Social Sciencessensory analysisclustering

description

International audience; Open-ended questions are commonly used in sensory analyses, and are usually dealt with by correspondence analysis (CA) of the term-respondent matrix. CA is apt in detecting strong associations between terms and groups of respondents, but less so when the questions are interpreted differently among respondents and, thus, seem to open a polysemic space for the answers. Also, CA offers little flexibility in filtering out irrelevant textual structure, or in controlling the relative contribution of rare versus frequent terms in the overall analysis. This contribution presents methodological extensions of CA together with application on a survey of 1900 responses bearing upon the understanding of the term "minerality" in wine, whose ambiguity is well attested.Clusters of terms, associated to different meanings of "minerality", are successfully retrieved and visualized. Technically, term-respondent matrix generates a weighted undirected network of positive definite edge weights between terms, interpretable as Markov associativities between terms, whose marginals define term weights. Its eigen-structure is intimately related to spectral clustering, as well as to K-means clustering and MDS visualization of chi2 dissimilarities between terms. The associativities can be renormalized by multiplying edge weights by powers of term weights, enabling the analyst to control the contribution of term weights. Also, modularity maximisation, popular for its efficient yet arguably instable clustering properties, is shown to correspond to a variant of spectral clustering for some power of renormalized associativities.

https://halshs.archives-ouvertes.fr/halshs-01237789