Search results for "Topic Model"

showing 10 items of 23 documents

Analysing Tourist Destination Image through Topic Modeling

2019

Topic modeling has become one of the most used methods to analyse textual data, proving able to “discover” hidden dimensions (topics) which characterise a corpus. This methodology can be used fruitfully to analyse complex phenomena like tourist destination image. With this aim in mind, this paper discusses the use of topic modeling over TV commercials which have been broadcast by four of the major cruise lines operating in Italy in recent years.

Settore SPS/08 - Sociologia Dei Processi Culturali E ComunicativiSettore SECS-S/05 - Statistica SocialeTopic modeling Destination image Television commercials Cruise lines Italy.
researchProduct

Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches

2022

Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …

Statistically Validated NetworkLDATopic Model
researchProduct

Ranking coherence in topic models using statistically validated networks

2023

Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…

Statistically Validated NetworksTopic coherenceText MiningProbabilistic Topic modelLibrary and Information SciencesInformation SystemsJournal of Information Science
researchProduct

Multi-label Classification Using Stacked Hierarchical Dirichlet Processes with Reduced Sampling Complexity

2018

Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear i…

Topic modelComputational complexity theoryComputer science02 engineering and technologyLatent Dirichlet allocationDirichlet distributionsymbols.namesakeArtificial Intelligence020204 information systems0202 electrical engineering electronic engineering information engineeringMathematicsMulti-label classificationbusiness.industrySampling (statistics)Pattern recognitionHuman-Computer InteractionDirichlet processMetropolis–Hastings algorithmHardware and ArchitectureTest setsymbols020201 artificial intelligence & image processingArtificial intelligencebusinessAlgorithmSoftwareInformation SystemsGibbs sampling2017 IEEE International Conference on Big Knowledge (ICBK)
researchProduct

Comparison of MeSH terms and KeyWords Plus terms for more accurate classification in medical research fields. A case study in cannabis research

2021

Abstract KeyWords Plus and Medical Subject Headings (MeSH) are widely used in bibliometric studies for topic mapping. The objective of this study is to compare the two description systems in documents about cannabis research to find the concordance between systems and establish whether there is neutrality in topic mapping. A total of 25,593 articles from 1970 to 2019 were drawn from Web of Science's Core Collection and Medline and analyzed. The tidytext library, Zipf's law, topic modeling tools, the contingency coefficient, Cramer's V, and Cohen's kappa were used. The results included 10,107 MeSH terms and 28,870 KeyWords Plus terms. The Zipf distribution of the terms was different for each…

Topic modelContingency tableInformation retrievalZipf's lawComputer scienceConcordanceMEDLINESubject (documents)Library and Information SciencesManagement Science and Operations ResearchComputer Science ApplicationsCohen's kappaMedia TechnologyKappaInformation SystemsInformation Processing & Management
researchProduct

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

2017

Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while …

Topic modelHierarchical Dirichlet processSpeedupGibbs algorithmComputer scienceNonparametric statistics02 engineering and technology010501 environmental sciences01 natural sciencesLatent Dirichlet allocationBayes' theoremsymbols.namesakeComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineeringsymbolsAlgorithm0105 earth and related environmental sciencesGibbs sampling
researchProduct

A Survey of Multi-Label Topic Models

2019

Every day, an enormous amount of text data is produced. Sources of text data include news, social media, emails, text messages, medical reports, scientific publications and fiction. To keep track of this data, there are categories, key words, tags or labels that are assigned to each text. Automatically predicting such labels is the task of multi-label text classification. Often however, we are interested in more than just the pure classification: rather, we would like to understand which parts of a text belong to the label, which words are important for the label or which labels occur together. Because of this, topic models may be used for multi-label classification as an interpretable mode…

Topic modelInformation retrievalComputer scienceGeography Planning and DevelopmentFlexibility (personality)02 engineering and technologyTask (project management)ComputingMethodologies_PATTERNRECOGNITION020204 information systems0202 electrical engineering electronic engineering information engineeringKey (cryptography)General Earth and Planetary Sciences020201 artificial intelligence & image processingSocial mediaWater Science and TechnologyACM SIGKDD Explorations Newsletter
researchProduct

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

2020

With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequen…

Topic modelshort textInformation retrievalSocial networkbusiness.industryLatent semantic analysisComputer scienceRandom projectiontopic modelingUser-generated contentSubject (documents)Context (language use)Latent Dirichlet allocationlcsh:QA75.5-76.95symbols.namesakeArtificial Intelligenceonline social networkssymbolsMethodslcsh:Electronic computers. Computer sciencenatural language processingbusinessuser-generated contentFrontiers in Artificial Intelligence
researchProduct

Staying at the front line of literature: How can topic modelling help researchers follow recent studies?

2021

Staying at the front line in learning research is challenging because many fields are rapidly developing. One such field is research on the temporal aspects of computer-supported collaborative learning (CSCL). To obtain an overview of these fields, systematic literature reviews can capture patterns of existing research. However, conducting systematic literature reviews is time-consuming and do not reveal future developments in the field. This study proposes a machine learning method based on topic modelling that takes articles from a systematic literature review on the temporal aspects of CSCL (49 original articles published before 2019) as a starting point to describe the most recent devel…

computer-supported collaborative learningkoneoppiminenliterature reviewtemporal analysistietokoneavusteinen oppiminenyhteisöllinen oppiminentopic modeltiedonhakuLsystemaattiset kirjallisuuskatsauksetautomatic content analysisEducationFrontline Learning Research
researchProduct

Examining Competing Entrepreneurial Concerns in a Social Question and Answer (SQA) Platform

2021

This study aims to determine the competing concerns of people interested in startup development and entrepreneurship by using topic modeling and sentiment analysis on a social question-and-answer (SQA) website. Understanding the underlying concerns of startup entrepreneurs is critical to society and economic growth. Therefore, greater scientific support for entrepreneurship remains necessary, including data mining from virtual social communities. In this study, an SQA platform was used to identify the sentiment of thirty concerns of people interested in startup entrepreneurship. Based on topic modeling and sentiment analysis of 18819 inquiries in various forums on an SQA, we identified addi…

haasteet (ongelmat)Topic modelEntrepreneurshiptopic modelingSocial questionentrepreneurial concerns512 Business and managementchallenges to startupsentrepreneurshipstartup-yrityksetideatDisruptive innovationSociologyyrityksen perustaminenbusiness.industrySentiment analysisFoundation (evidence)huolestuneisuusIdeationPublic relationsyrittäjyysperustaminenyrityksetinnovaatiotVariety (cybernetics)yritystoimintasentiment analysisbusinessProceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
researchProduct