An exploration of semi-supervised text classification
Master's thesis in Information- and communication technology (IKT590) Obtaining labeled data to train natural language machine learning algorithms is often expensive and time-consuming, while unlabeled data usually is free and easy to get. Frequently a large amount of labeled data is required by supervised learning to achieve good text classification performance. Semi-supervised learning (SSL) for text classification is an exciting area of research. SSL is a technique exploiting unlabeled and labeled data to achieve better classification performance than using labeled data alone and is particularly useful with limited labeled data. This thesis explores the impact of different parameters on …