6533b7dcfe1ef96bd12716a2

RESEARCH PRODUCT

Improving Classification of Tweets Using Linguistic Information from a Large External Corpus

Hugo Lewi HammerPaal E. EngelstadAnis YazidiAleksander Bai

subject

VocabularyInformation retrievalbusiness.industryComputer sciencemedia_common.quotation_subjectRepresentation (systemics)computer.software_genreRule-based machine translationBag-of-words modelArtificial intelligencebusinesscomputerNatural language processingWord (computer architecture)media_common

description

The bag of words representation of documents is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. Improvements might be achieved by expanding the vocabulary with other relevant word, like synonyms.

http://hdl.handle.net/11250/2436325