6533b85cfe1ef96bd12bd267

RESEARCH PRODUCT

Part-of-Speech Induction by Singular Value Decomposition and Hierarchical Clustering

Reinhard Rapp

subject

VocabularyK-SVDComputer sciencebusiness.industrymedia_common.quotation_subjectDimensionality reductionCorrelation clusteringPattern recognitionContext (language use)Hierarchical clusteringSingular value decompositionArtificial intelligencebusinessWord (computer architecture)media_common

description

Part-of-speech induction involves the automatic discovery of word classes and the assignment of each word of a vocabulary to one or several of these classes. The approach proposed here is based on the analysis of word distributions in a large collection of German newspaper texts. Its main advantage over other attempts is that it combines the hierarchical clustering of context vectors with a previous step of dimensionality reduction that minimizes the effects of sampling errors.

https://doi.org/10.1007/3-540-31314-1_51