0000000000136108

AUTHOR

Sophie Burkhardt

showing 6 related works from this author

Multi-label Classification Using Stacked Hierarchical Dirichlet Processes with Reduced Sampling Complexity

2018

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear i…

Topic modelComputational complexity theoryComputer science02 engineering and technologyLatent Dirichlet allocationDirichlet distributionsymbols.namesakeArtificial Intelligence020204 information systems0202 electrical engineering electronic engineering information engineeringMathematicsMulti-label classificationbusiness.industrySampling (statistics)Pattern recognitionHuman-Computer InteractionDirichlet processMetropolis–Hastings algorithmHardware and ArchitectureTest setsymbols020201 artificial intelligence & image processingArtificial intelligencebusinessAlgorithmSoftwareInformation SystemsGibbs sampling2017 IEEE International Conference on Big Knowledge (ICBK)
researchProduct

Towards identifying drug side effects from social media using active learning and crowd sourcing.

2019

Motivation Social media is a largely untapped source of information on side effects of drugs. Twitter in particular is widely used to report on everyday events and personal ailments. However, labeling this noisy data is a difficult problem because labeled training data is sparse and automatic labeling is error-prone. Crowd sourcing can help in such a scenario to obtain more reliable labels, but is expensive in comparison because workers have to be paid. To remedy this, semi-supervised active learning may reduce the number of labeled data needed and focus the manual labeling process on important information. Results We extracted data from Twitter using the public API. We subsequently use Ama…

0303 health sciencesFocus (computing)Information retrievalDrug-Related Side Effects and Adverse ReactionsProcess (engineering)business.industryActive learning (machine learning)Computer scienceComputational BiologyCrowdsourcing03 medical and health sciences0302 clinical medicineProblem-based learningCode (cryptography)CrowdsourcingHumansSocial media030212 general & internal medicinebusinessBaseline (configuration management)Social Media030304 developmental biologyPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
researchProduct

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

2017

Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while …

Topic modelHierarchical Dirichlet processSpeedupGibbs algorithmComputer scienceNonparametric statistics02 engineering and technology010501 environmental sciences01 natural sciencesLatent Dirichlet allocationBayes' theoremsymbols.namesakeComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineeringsymbolsAlgorithm0105 earth and related environmental sciencesGibbs sampling
researchProduct

A Survey of Multi-Label Topic Models

2019

Every day, an enormous amount of text data is produced. Sources of text data include news, social media, emails, text messages, medical reports, scientific publications and fiction. To keep track of this data, there are categories, key words, tags or labels that are assigned to each text. Automatically predicting such labels is the task of multi-label text classification. Often however, we are interested in more than just the pure classification: rather, we would like to understand which parts of a text belong to the label, which words are important for the label or which labels occur together. Because of this, topic models may be used for multi-label classification as an interpretable mode…

Topic modelInformation retrievalComputer scienceGeography Planning and DevelopmentFlexibility (personality)02 engineering and technologyTask (project management)ComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineeringKey (cryptography)General Earth and Planetary Sciences020201 artificial intelligence & image processingSocial mediaWater Science and TechnologyACM SIGKDD Explorations Newsletter
researchProduct

Focusing Knowledge-based Graph Argument Mining via Topic Modeling

2021

Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from…

FOS: Computer and information sciencesComputer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Artificial IntelligenceInformation Retrieval (cs.IR)Computer Science - Information RetrievalMachine Learning (cs.LG)
researchProduct

Rule Extraction From Binary Neural Networks With Convolutional Rules for Model Validation.

2020

Classification approaches that allow to extract logical rules such as decision trees are often considered to be more interpretable than neural networks. Also, logical rules are comparatively easy to verify with any possible input. This is an important part in systems that aim to ensure correct operation of a given model. However, for high-dimensional input data such as images, the individual symbols, i.e. pixels, are not easily interpretable. Therefore, rule-based approaches are not typically used for this kind of high-dimensional data. We introduce the concept of first-order convolutional rules, which are logical rules that can be extracted using a convolutional neural network (CNN), and w…

FOS: Computer and information sciencesComputer Science - Machine Learningstochastic local searchrule extractionComputer Science - Artificial Intelligencelogical rulesQA75.5-76.95004 InformatikMachine Learning (cs.LG)Artificial Intelligence (cs.AI)Artificial IntelligenceElectronic computers. Computer scienceconvolutional neural networksk-term DNFinterpretability004 Data processingOriginal ResearchFrontiers in artificial intelligence
researchProduct