0000000000220211

AUTHOR

Andrea Simonetti

showing 7 related works from this author

Statistically Validated Networks for assessing topic quality in LDA models

2022

Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution overwords characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be only characterized by a set of irrelevant or unchained words, being useless for the interpretation. Although many topic-quality metrics were proposed (Newman et al., 2009; Alet…

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.Settore SECS-S/01 - StatisticaTopic Model Topic Coherence LDA Statistically Validated Networks
researchProduct

MEASURING TOPIC COHERENCE THROUGH STATISTICALLY VALIDATED NETWORKS

2020

Topic models arise from the need of understanding and exploring large text document collections and predicting their underlying structure. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) has quickly become one of the most popular text modelling techniques. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models give no guaranty on the interpretability of their outputs. The topics learned from texts may be characterized by a set of irrelevant or unchained words. Therefore, topic models require validation of the coherence of estimated topics. However, the automatic evaluation …

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.topic model topic coherence LDA statistically validated networks.Settore SECS-S/01 - Statistica
researchProduct

Ranking coherence in topic models using statistically validated networks

2023

Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…

Statistically Validated NetworksTopic coherenceText MiningProbabilistic Topic modelLibrary and Information SciencesInformation SystemsJournal of Information Science
researchProduct

Marked Hawkes processes for Twitter data

2022

In this paper, we propose to model retweet event sequences using a marked Hawkes process, which is a self-exciting point process where the occurrence of previous events in time increases the probability of further events. The aim is to analyse Twitter data combining temporal point processes theory and textual analysis. Since each retweet event carries a set of properties, we mark the process by different characteristics drawn from the textual analysis, finding that the tone of the description of the Twitter user is a good predictor of the number of retweets in a single cascade.

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.Twitter data self-exciting point processes textual analysis Hawkes modelsSettore SECS-S/01 - Statistica
researchProduct

Using Local Ecological Knowledge of Fishers to Reconstruct Abundance Trends of Elasmobranch Populations in the Strait of Sicily

2020

Fishers “local ecological knowledge” (LEK) can be used to reconstruct long-term trends of species that are at very low biomass due to overfishing. In this study, we used historical memories of Sicilian fishers to understand their perception of change in abundance of cartilaginous fish in the Strait of Sicily over the last decades. We conducted interviews with 27 retired fishers from Mazara del Vallo harbor (SW Sicily) working in demersal fisheries, using a pre-defined questionnaire with a series of open and fixed questions related to the abundance of sharks and rays. The questionnaire included specific questions about the trends they perceived in catch or by-catch of cartilaginous fish abun…

0106 biological sciences010504 meteorology & atmospheric scienceslcsh:QH1-199.5PopulationOcean EngineeringMustelus asteriasAquatic Sciencelcsh:General. Including nature conservation geographical distributionOceanography01 natural sciencesDemersal zoneAbundance (ecology)sharks and batoidsMediterranean Sea14. Life underwaterSqualidaeeducationlcsh:ScienceRelative species abundanceChondrichthyesfisheries sustainability0105 earth and related environmental sciencesWater Science and TechnologyGlobal and Planetary Changeeducation.field_of_studyOverfishingbiologyEcology010604 marine biology & hydrobiologylocal ecological knowledgebiology.organism_classificationCentrophoridaeGeographylcsh:QFrontiers in Marine Science
researchProduct

Statistically Validated Networks for evaluating coherence in topic models

2022

Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be characterized by a set of irrelevant or unchained words, being useless for the interpretation. In the framework of topic quality evaluation, the pairwise semantic cohesion among the top-N most pr…

Settore SECS-S/06 -Metodi Mat. dell'Economia e d. Scienze Attuariali e Finanz.Text Mining Probabilistic Topic Models Topic coherence Statistically Validated NetworksSettore SECS-S/01 - Statistica
researchProduct

Development of statistical methods for the analysis of textual data

2022

Network AnalysiText MiningStatisticsSettore SECS-P/06 - Economia ApplicataNLP
researchProduct