6533b7d4fe1ef96bd1261eae

RESEARCH PRODUCT

Towards identifying drug side effects from social media using active learning and crowd sourcing.

Julia SiekieraStefan KramerJosua GloddeSophie BurkhardtMiguel A. Andrade-navarro

subject

0303 health sciencesFocus (computing)Information retrievalDrug-Related Side Effects and Adverse ReactionsProcess (engineering)business.industryActive learning (machine learning)Computer scienceComputational BiologyCrowdsourcing03 medical and health sciences0302 clinical medicineProblem-based learningCode (cryptography)CrowdsourcingHumansSocial media030212 general & internal medicinebusinessBaseline (configuration management)Social Media030304 developmental biology

description

Motivation Social media is a largely untapped source of information on side effects of drugs. Twitter in particular is widely used to report on everyday events and personal ailments. However, labeling this noisy data is a difficult problem because labeled training data is sparse and automatic labeling is error-prone. Crowd sourcing can help in such a scenario to obtain more reliable labels, but is expensive in comparison because workers have to be paid. To remedy this, semi-supervised active learning may reduce the number of labeled data needed and focus the manual labeling process on important information. Results We extracted data from Twitter using the public API. We subsequently use Amazon Mechanical Turk in combination with a state-of-the-art semi-supervised active learning method to label tweets with their associated drugs and side effects in two stages. Our results show that our method is an effective way of discovering side effects in tweets with an improvement from 53% F-measure to 67% F-measure as compared to a one stage work flow. Additionally, we show the effectiveness of the active learning scheme in reducing the labeling cost in comparison to a non-active baseline. Availability Code and data will be published on https://github.com/kramerlab.

https://pubmed.ncbi.nlm.nih.gov/31797607