0000000000145693
AUTHOR
Mariangela Sciandra
Flexible modelling of serial correlation in GLMM
Un mese di Covid-19 in Italia: una guida alla lettura dei dati per bloccare la disinformazione
e dal primo deceduto a Padova, siamo nel bel mezzo di un sovraccarico di informazioni sul nuovo Coronavirus, con aggiornamenti continui dei casi e flussi infiniti di informazioni. E se tante sono le informazioni che ci arrivano, altrettante sono le domande nuove che ogni giorno ci poniamo. Tra queste, quella che riempe le nostre lunghe giornate in cui vediamo la nostra liberta sempre pi ` u` decurtata da normative sempre piu stringenti, e relativa ` alla temporaneita di queste soluzioni: Quanto durer ` a il ` “temporaneo”? Ci sono stati proposti i piu svariati grafici, ` alcuni semplici altri piu complessi; abbiamo sentito parlare ` di andamento esponenziale, logaritmico, di picco e talvolt…
Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.
Topic modeling is a type of statistical modeling for discovering the abstract ``topics'' that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a fixed number of topics starting from words in each document modeled according to a Dirichlet distribution. In this work we are going to apply LDA to a set of songs from four famous Italian songwriters and split them into topics. This work studies the use of themes in lyrics using statistical analysis to detect topics. Aim of the work is to underline the main limits of the standard unsupervised LDA and to propose a supervised…
Regression diagnostics to analyze complex ecological systems through Generalized Linear Mixed Models
Random Forest Analysis: A New Approach for Classification of Beta Thalassemia
In recent years, Thalassemia care providers started classifying patients as transfusion-dependent-Thalassemia (TDT) or non-transfusion-dependent-Thalassemia (NTDT) owing to the established role of transfusion therapy in defining the clinical complication profile, although this classification was also based on expert opinion and is limited by reliance on patients’current transfusion status. Starting from a vast set of variables indicating severity phenotype, through the use of both classification and clustering techniques we want to explore the presence of two (TDT vs NTDT) or more clusters, in order to approaching to a new definition for the classification of Beta-Thalassemia in Thalassemia…
Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches
Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …
A family of distances for preference–approvals
Producción Científica
Stato di Qualità Ecologica (EcoQ) delle acque costiere in Mediterraneo mediante l’indice biotico POSIX (POSidonia IndeX)
Model interpretation from the additive elements of the PWRSS in GLMMs
Generalized Linear Mixed models(GLMMs)have rapidly become a widely used tool for modelling clustered and longitudinal data with non-Normal responses. Although a large amount of work has been done in the literature on likelihood-based inference on GLMMs,little seems to have been done on the decomposition of the total variability associated to the different components of a mixed model.In this work we try to generalize the idea of likelihood additive elements Whittaker,1984), proposed in the context of GLMs,to the case of GLMMs by using the Penalized Weighted Residual Sum of Squares(PWRSS). The proposal is illustrated by means of areal application.
A PCA Interpretation of the Glasgow Coma Scale in the Trauma Brain Injury PECARN Dataset
CT scan is strongly recommended for a patient affected by head trauma, but he/she must absorb a certain amount of radiations. For this reason, the physician tries to avoid such a practice for pediatric patients. The symptoms analysis, visual/tactile inspection, and reactions to appropriate stimuli from the physician could induce him/her to put the patient in a period of observation instead of performing an immediate CT scan. As a consequence, the correct evaluation of those symptoms is a crucial task. For this reason, the Pediatric Glasgow Coma Scale (PGCS) plays a fundamental role, because it is a numeric scale regarding the patient’s mental status. It is computed as the sum of the score f…
A comparison of ensemble algorithms for item-weighted Label Ranking
Label Ranking (LR) is a non-standard supervised classification method with the aim of ranking a finite collection of labels according to a set of predictor variables. Traditional LR models assume indifference among alternatives. However, misassigning the ranking position of a highly relevant label is frequently regarded as more severe than failing to predict a trivial label. Moreover, switching two similar alternatives should be considered less severe than switching two different ones. Therefore, efficient LR classifiers should be able to take into account the similarities and individual weights of the items to be ranked. The contribution of this paper is to formulate and compare flexible i…
How to define deviance residuals in multinomial regression.
This work is devoted to the study of diagnostic tools for categorical data models with an emphasis on the presence of continuous covariates. In particular, the aim is to define a new class of residuals from the parametric multinomial family of models and to study their asymptotics properties. In logistic regression (as in generalized linear models), there are a few different kinds of residuals; we propose a generalization of deviance residuals as defined in logistic regression to the multinomial case and propose their use in order to identify inadequacies in a multinomial model.
Weighted distance-based trees for ranking data
Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.
Projection Clustering Unfolding: A New Algorithm for Clustering Individuals or Items in a Preference Matrix
In the framework of preference rankings, the interest can lie in clustering individuals or items in order to reduce the complexity of the preference space for an easier interpretation of collected data. The last years have seen a remarkable flowering of works about the use of decision tree for clustering preference vectors. As a matter of fact, decision trees are useful and intuitive, but they are very unstable: small perturbations bring big changes. This is the reason why it could be necessary to use more stable procedures in order to clustering ranking data. In this work, a Projection Clustering Unfolding (PCU) algorithm for preference data will be proposed in order to extract useful info…
Modeling Posidonia oceanica growth data: from linear to generalized linear mixed models
The statistical analysis of annual growth of Posidonia oceanica is traditionally carried out through Gaussian linear models applied to untransformed, or log-transformed, data. In this paper, we claim that there are good reasons for re-considering this established practice, since real data on annual growth often violate the assumptions of Gaussian linear models, and show that the class of Generalized Linear Models (GLMs) represents a useful alternative for handling such violations. By analyzing Sicily PosiData-1, a real dataset on P. oceanica growth data gathered in the period 2000–2002 along the coasts of Sicily, we find that in the majority of cases Normality is rejected and the effect of …
Quantile regression via iterative least squares computations
We present an estimating framework for quantile regression where the usual L 1-norm objective function is replaced by its smooth parametric approximation. An exact path-following algorithm is derived, leading to the well-known ‘basic’ solutions interpolating exactly a number of observations equal to the number of parameters being estimated. We discuss briefly possible practical implications of the proposed approach, such as early stopping for large data sets, confidence intervals, and additional topics for future research.
Parceling in Multilevel Structural Equation Models for the measure of a latent construct
When the variables of interest are measured by a set of items on units having a multilevel setting, conventional structural equation models cannot be used because the assumption of independence of all latent variables and indicators across units is violated due to the within-cluster dependence. In this work we propose the use of parcelling in defining of latent variables of a multilevel structural equation model (MSEM). The paper aims to face the problem of the use of categorical item response data when a multilevel SEM must be applied.
A two-stage LDA algorithm for ranking induced topic readability
Probabilistic topic models, such as LDA, are standard text analysis algorithms that provide predictive and latent topic representation for a corpus. However, due to the unsupervised training process, it is difficult to verify the assumption that the latent space discovered by these models is generally meaningful and valuable. This paper introduces a two-stage LDA algorithm to estimate latent topics in text documents and use readability scores to link the identified topics to a linguistically motivated latent structure. We define a new interpretative tool called induced topic readability, which is used to rank topics from the one with the most complex linguistic structure to the one with the…
Direct Organogenesis from Cotyledons in Cultivars of Citrus clementina Hort. ex Tan
An efficient protocol to induce shoot buds regeneration in Citrus clementina cultivars (“Monreal”, “SRA 63” and “SRA 64”) by direct organogenesis has been developed using cotyledons as explants. Cotyledons transversely cut in three segments and entire ones were cultured on Murashige and Skoog (1962) solidified medium containing vitamins, 500 mg·l−1 malt extract, 50 g·l−1 sucrose and supplemented with three different concentrations of BAP (8.8, 13.2 and 17.6 μM). In all three cultivars the entire cotyledons showed more shoot morphogenic potential than transversely cut ones and after 60 incubation days the optimum BAP concentration was 17.6 μM in “Monreal” (50% ± 2.89% of frequency regenerati…
A model-based approach to Spotify data analysis: a Beta GLMM
Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…
A new position weight correlation coefficient for consensus ranking process without ties
Preference data represent a particular type of ranking data where a group of people gives their preferences over a set of alternatives. The traditional metrics between rankings do not take into account the importance of swapping elements similar among them (element weights) or elements belonging to the top (or to the bottom) of an ordering (position weights). Following the structure of the τx proposed by Emond and Mason and the class of weighted Kemeny–Snell distances, a proper rank correlation coefficient is defined for measuring the correlation among weighted position rankings without ties. The one‐to‐one correspondence between the weighted distance and the rank correlation coefficient ho…
Random forest analysis: a new approach for classication of Beta Thalassemia
In recent years, Thalassemia care providers started classifying patients as transfusion- dependent-Thalassemia (TDT) or non-transfusion-dependent-Thalassemia (NTDT) owing to the established role of transfusion therapy in dening the clinical complication prole, although this classication was also based on expert opinion and is limited by reliance on patients'current transfusion status. Starting from a vast set of variables indicating severity phenotype, through the use of both classication and clustering techniques we want to explore the presence of two (TDT vs NTDT) or more clusters, in order to approaching to a new denition for the classication of Beta-Thalassemia in Thalassemia Syndromes …
Variable selection in mixed models: a graphical approach
Model selection can be defined as the task of estimating the performance of dif- ferent models in order to choose the (approximate) best one. The purpose of this article is to introduce an extension of the graphical representation of deviance proposed in the framework of classical and generalized linear models to the wider class of mixed models. The proposed plot is useful in determining which are the important explanatory variables conditioning on the random effects part. The applicability and the easy interpretation of the graph are illus- trated with a real data examples.
GAMLSS for Big Data: Roc Curve prediction using Twitter data
During last years, Big Data appears as one of the most innovative and growing scientific area of interest. In this field, finding reliable methods to make accurate predictions represents one of the most inspirational challenges. The way to make prediction in the following paper is the use of ROC (Receiver Operating Characteristic) Curve, a binary prediction tool, often used for medical tests. The attention is focused in particular on the implementation of ROC Curve in GAMLSS (Generalized Additive Models for Location Scale and Shape), semi-parametric models suitable for huge and flexible dataset. An application will be shown where the class of GAMLSS is applied to Twitter data in order to pr…
Multiple smoothing parameters selection in additive regression quantiles
We propose an iterative algorithm to select the smoothing parameters in additive quantile regression, wherein the functional forms of the covariate effects are unspecified and expressed via B-spline bases with difference penalties on the spline coefficients. The proposed algorithm relies on viewing the penalized coefficients as random effects from the symmetric Laplace distribution, and it turns out to be very efficient and particularly attractive with multiple smooth terms. Through simulations we compare our proposal with some alternative approaches, including the traditional ones based on minimization of the Schwarz Information Criterion. A real-data analysis is presented to illustrate t…
The Neutrophil-to-Lymphocyte Ratio is Related to Disease Activity in Relapsing Remitting Multiple Sclerosis
: Background: The role of the neutrophil-to-lymphocyte ratio (NLR) of peripheral blood has been investigated in relation to several autoimmune diseases. Limited studies have addressed the significance of the NLR in terms of being a marker of disease activity in multiple sclerosis (MS). Methods: This is a retrospective study in relapsing&ndash
Statistical analysis of P. oceanica growth data: from standard linear models to Generalized Linear Models
Analisi delle performance di crescita di Posidonia oceanica attraverso l’uso di modelli lineari generalizzati misti (GLMM)
Boosting for ranking data: an extension to item weighting
Gli alberi decisionali sono una tecnica predittiva di machine learning particolarmente diffusa, utilizzata per prevedere delle variabili discrete (classificazione) o continue (regressione). Gli algoritmi alla base di queste tecniche sono intuitivi e interpretabili, ma anche instabili. Infatti, per rendere la classificazione più affidabile si `e soliti combinare l’output di più alberi. In letteratura, sono stati proposti diversi approcci per classificare ranking data attraverso gli alberi decisionali, ma nessuno di questi tiene conto ne dell’importanza, ne delle somiglianza dei singoli elementi di ogni ranking. L’obiettivo di questo articolo `e di proporre un’estensione ponderata del metodo …
Consensus among preference rankings: a new weighted correlation coefficient for linear and weak orderings
AbstractPreference data are a particular type of ranking data where some subjects (voters, judges,...) express their preferences over a set of alternatives (items). In most real life cases, some items receive the same preference by a judge, thus giving rise to a ranking with ties. An important issue involving rankings concerns the aggregation of the preferences into a “consensus”. The purpose of this paper is to investigate the consensus between rankings with ties, taking into account the importance of swapping elements belonging to the top (or to the bottom) of the ordering (position weights). By combining the structure of $$\tau _x$$ τ x proposed by Emond and Mason (J Multi-Criteria Decis…
Ensemble methods for item-weighted label ranking: a comparison
Label Ranking (LR), an emerging non-standard supervised classification problem, aims at training preference models that order a finite set of labels based on a set of predictor features. Traditional LR models regard all labels as equally important. However, in many cases, failing to predict the ranking position of a highly relevant label can be considered more severe than failing to predict a trivial one. Moreover, an efficient LR classifier should be able to take into account the similarity between the items to be ranked. Indeed, swapping two similar elements should be less penalized than swapping two dissimilar ones. The contribution of the present paper is to formulate more flexible item…
Impact of the COVID-19 pandemic on music: a method for clustering sentiments
The outbreak of coronavirus disease 2019 (COVID-19) was highly stressful for people. In general, fear and anxiety about a disease can be overwhelming and cause strong emotions in adults and children. One way to cope with this stress consists in listening to music. Aim of this work is to understand if the music heard during the lock-down reflects the emotions generated by the pandemic on each of us. So, the primary goal of this work is to build two indices for measuring the anger and joy levels of the top streamed songs by Italian Spotify users (during the SARS-CoV-2 pandemic), and study their evolution over time. A Hierarchical Cluster Analysis has been applied in order to identify groups o…
Classification trees for multivariate ordinal response: an application to Student Evaluation Teaching
Data from multiple items on an ordinal scale are commonly collected when qualitative variables, such as feelings, attitudes and many other behavioral and health-related variables are observed. In this paper we introduce a method to derive a distance-based tree for multivariate ordinal response that allows, when subject-specific characteristics are available, to derive common profiles for respondents giving the same/similar multivariate ratings. Special attention will be paid to the performance comparison in terms of AUC, for three different distances used as splitting criteria. Simulated data an a dataset from a Student Evaluation of Teaching survey will be used as illustrative examples. Th…
DOES AIR POLLUTION MODIFY THE HEAT TOLERANCE? ESTIMATING THE THRESHOLD-LINE VIA SEGMENTED REGRESSION
Influenza del substrato su crescita dei rizomi e biometria fogliare di Posidonia oceanica
Towards the definition of distance measures in the preference-approval structures
The task of combining preference rankings and approval voting is a relevant issue in social choice theory. The preference-approval voting (PAV) analyses the preferences of a group of individuals over a set of items. The main difference with the classical approaches for preference data consists in introducing, in addition to the ranking of candidates, a further distinction; candidates are subsetted in “acceptable” and “unacceptable”, or also in “good set” and “bad set” (a way to express the approval/disapproval). This work introduces the definition of a new measure to quantify disagreement between preference-approval profiles. For each pair of alternatives, we consider the two possible disag…
Recursive partitioning: an approach based on the weighted kemeny distance
In the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures. The possibility to derive consensus measures using a classification tree represents a novelty and an important tool, given its easy interpretability. This work proposes the use of a univariate decision tree for ranking data based on the weighted Kemeny distance. The performance of the methodology will be shown by using a real dataset about university rankings.
Discontinuation of teriflunomide and dimethyl fumarate in a large Italian multicentre population: a 24-month real-world experience
Teriflunomide (TRF) and Dimethyl fumarate (DMF) are licensed drugs for relapsing-remitting Multiple Sclerosis (RRMS). We aimed to compare the rate and the time to discontinuation among persons with RRMS (pwRRMS), newly treated with TRF and DMF. A retrospective study on prospectively collected data was performed in nine tertiary MS centers, in Italy. The 24-month discontinuation rate in the two cohorts was the primary study outcome. We also assessed the time to discontinuation and reasons of therapy withdrawn. Discontinuation of TRF and DMF was defined as a gap of treatment ≥ 60 days. A cohort of 903 pwRRMS (316 on TRF and 587 on DMF) was analyzed. During 24 months of follow-up, pwRRMS on TR…
Dealing with the Pseudo-Replication Problem in Longitudinal Data from Posidonia Oceanica Surveys: Modeling Dependence vs. Subsampling
Posidonia oceanica represents the key species of the most important ecosystem in subtidal habitats of the Mediterranean Sea. Being sensitive to changes in the environment, it is considered a crucial indicator of the quality of coastal marine waters. A peculiarity of P. oceanica is the presence of reiterative modules characterizing its growth, which lend themselves to back-dating techniques, allowing for the reconstruction of past history of growth variables (annual rhizome elongation and diameter, primary production, etc.). Such back-dating techniques provide, for each sampled shoot, a longitudinal series of multivariate data; this is an instance of what Hurlbert (1984) in a seminal paper d…
Resilience of the seagrass Posidonia oceanica following pulse-type disturbance.
Understanding the response of species to disturbance and the ability to recover is crucial for preventing their potential collapse and ecosystem phase shifts. Explosive submarine activity, occurring in shallow volcanic vents, can be considered as a natural pulse disturbance, due to its suddenness and high intensity, potentially affecting nearby species and ecosystems. Here, we present the response of Posidonia oceanica, a long-lived seagrass, to an exceptional submarine volcanic explosion, which occurred in the Aeolian Archipelago (Italy, Mediterranean Sea) in 2002, and evaluate its resilience in terms of time required to recover after such a pulse event. The study was carried out in 2011 i…
A new multivariate Biotic Index to assess Ecological Quality status of Mediterranean coastal waters
Modelling the relationship between sexual reproduction and rhizome growth in Posidonia oceanica (L.) Delile
The relationship between flowering and growth performance of Posidonia oceanica (L.) Delile in meadows distributed along the south-eastern coast of Sicily (Italy) was investigated by means of a statistical model (generalized linear mixed model) combined with the lepidochronological analysis. Over a 28-year period, 67 floral stalk remains were observed. The highest flowering index was recorded in lepidochronological year 1998 (10.1%) and the Inflorescence Frequency per age showed a clear decrease corresponding to 15-year-old shoots. The sexual reproductive event had positive effects on rhizome elongation (cm year−1) and leaf production (no. leaves year−1) in the same flowering year, whilst n…
Reference growth charts for Posidonia oceanica seagrass: An effective tool for assessing growth performance by age and depth
Abstract Growth performance of rhizomes has become among the most used descriptors for monitoring Posidonia oceanica seagrass dynamics and population status. However, ability to detect any change of growth in space or in time is often confounded by natural age-induced decline. To overcome this problem, we have produced reference growth charts, which in other areas are universally recognized as a very powerful tool for comparing growth of living beings during their ontogeny. Reference growth charts involving different P. oceanica growth performance measures (speed of growth and primary production of rhizomes) have been built using proper statistical frameworks (GLMM, Segmented and Quantile R…
Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology.
We discuss a practical and effective framework to estimate reference growth charts via regression quantiles. Inequality constraints are used to ensure both monotonicity and non-crossing of the estimated quantile curves and penalized splines are employed to model the nonlinear growth patterns with respect to age. A companion R package is presented and relevant code discussed to favour spreading and application of the proposed methods.
ENSEMBLE METHODS FOR RANKING DATA
The last years have seen a remarkable flowering of works about the use of decision trees for ranking data. As a matter of fact, decision trees are useful and intuitive, but they are very unstable: small perturbations bring big changes. This is the reason why it could be necessary to use more stable procedures, as ensemble methods, in order to find which predictors are able to explain the preference structure. In this work ensemble methods as BAGGING and Random Forest are proposed, from both a theoretical and computational point of view, for deriving classification trees when ranking data are observed. The advantages of these procedures are shown through an example on the SUSHI data set.
GAMLSS for high-variability data: an application to liver fibrosis case
In this paper, we propose management of the problem caused by overdispersed data by applying the generalized additive model for location, scale and shape framework (GAMLSS) as introduced by Rigby and Stasinopoulos (2005). The idea of using a GAMLSS approach for handling our problem comes from the idea of Aitkin (1996) consisting in the use of an EM maximum likelihood estimation algorithm (Dempster, Laird, and Rubin, 1977) to deal with overdispersed generalized linear models (GLM). As in the GLM case, the algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution. The GAMLSS specification allows the extension of the Aitkin algorithm to probability d…
Dealing with dependence in retrospective ecological data through longitudinal models
Curve di crescita di riferimento nel monitoraggio di Posidonia oceanica: alcune stime preliminari
Effect of different substrata on rhizome growth, leaf biometry and shoot density of Posidonia oceanica
The effects of different substratum typologies on Posidonia oceanica growth and morphology were estimated in four Sicilian meadows using Generalized and Linear Mixed Models combined with retrodating and biometric analyses. Substratum exerted a multiple effect, resulting in different biometric features for P. oceanica shoots settled on rock from those growing on sand and matte. On rock, values for growth rate, leaf length and shoot surface were lower than those on other substrata, with 42%, 23% and 32% the highest degree of difference respectively. The present study may have interesting methodological consequences for the comprehensive understanding of the causative variables potentially aff…
A model for the analysis of the temperature effects on mortality
Ho perso le parole: come ritrovarle con la sentiment analysis. Metodi statistici per l'analisi della produzione discografica di Luciano Ligabue.
Questo libro nasce dall’incontro tra due persone con le stesse due passioni di vita: la statistica e Ligabue. E chi lo ha detto che nella vita non si può unire la passione musicale con la vita lavorativa quotidiana? Karl Pearson diceva «Statistics is the grammar of Science» e chi può non definire la musica come Scienza? La musica che incontra la scienza dà origine a creatività e bellezza, e in questa libro si andrà alla ricerca di questa connessione. Attraverso l’uso di sofisticate tecniche statistiche viaggeremo tra le caratteristiche musicali dei brani di Luciano Ligabue, studiandone i cambiamenti temporali e gli elementi di maggiore interesse per chi lo ascolta ma anche per chi lo critic…
Effetto a breve termine dell'inquinamento sulla salute: Palermo 1997-2002
A graphical model selection tool for mixed models
Model selection can be defined as the task of estimating the performance of different models in order to choose the most parsimonious one, among a potentially very large set of candidate statistical models. We propose a graphical representation to be considered as an extension to the class of mixed models of the deviance plot proposed in the literature within the framework of classical and generalized linear models. This graphical representation allows, once a reduced number of models have been selected, to identify important covariates focusing only on the fixed effects component, assuming the random part properly specified. Nevertheless, we suggest also a standalone figure representing th…
Confondimento dell’età nell’analisi degli effetti di perturbazioni antropiche su variabili biometriche di Posidonia oceanica (L.) Delile
An improved detection of clusters in complex ecological systems by using the Ripley’s K-function
Potenzialità e vantaggi dell’uso di Modelli Lineari Generalizzati e di Modelli Lineari Generalizzati Misti nell’analisi statistica della crescita di Posidonia oceanica.
A weighted distance-based approach with boosted decision trees for label ranking
Label Ranking (LR) is an emerging non-standard supervised classification problem with practical applications in different research fields. The Label Ranking task aims at building preference models that learn to order a finite set of labels based on a set of predictor features. One of the most successful approaches to tackling the LR problem consists of using decision tree ensemble models, such as bagging, random forest, and boosting. However, these approaches, coming from the classical unweighted rank correlation measures, are not sensitive to label importance. Nevertheless, in many settings, failing to predict the ranking position of a highly relevant label should be considered more seriou…
Subject-specific odds ratios in binomial GLMMs with continuous response
In a regression context, the dichotomization of a continuous outcome variable is often motivated by the need to express results in terms of the odds ratio, as a measure of association between the response and one or more risk factors. Starting from the recent work of Moser and Coombs (Odds ratios for a continuous outcome variable without dichotomizing, Statistics in Medicine, 2004, 23, 1843-1860), in this article we explore in a mixed model framework the possibility of obtaining odds ratio estimates from a regression linear model without the need of dichotomizing the response variable. It is shown that the odds ratio estimators derived from a linear mixed model outperform those from a binom…
Classification trees for preference data: a distance-based approach
In the framework of preference rankings, when the interest lies in explaining which predictors and which interactions among predictors are able to explain the observed preference structures, the possibility to derive consensus measures using a classi cation tree represents a novelty and an important tool given its easy interpretability. In this work we propose the use of a multivariate decision tree where a weighted Kemeny distance is used both to evaluate the distances between rankings and to de ne an impurity measure to be used in the recursive partitioning. The proposed approach allows also to weight di erently high distances in rankings in the top and in the bottom alternatives.
New Flexible Probability Distributions for Ranking Data
Recently, several models have been proposed in literature for analyzing ranks assigned by people to some object. These models summarize the liking feeling for this object, possibly also with respect to a set of explanatory variables. Some recent works have suggested the use of the Shifted Binomial and of the Inverse Hypergeometric distribution for modelling the approval rate, while mixture models have been developed for taking into account the uncertainty of the ranking process. We propose two new probabilistic models, based on the Discrete Beta and the Shifted-Beta Binomial distributions, that ensure much flexibility and allow the joint modelling of the scale (approval rate) and the shape …
Shoot age as a confounding factor on detecting the effect of human-induced disturbance on Posidonia oceanica growth performance
Abstract The response of orthotropic rhizome elongation and primary production of Posidonia oceanica to anthropogenic perturbations and potential confounding effects of shoot age were assessed using a Linear Multilevel Model (LMM). This model examined the confounding effect of age by comparing the estimates of impact and variance components obtained by excluding and including Age as an explanatory variable. Age had a negative effect on rhizome elongation and primary production with an annual decrease of 0.6 mm y − 1 and 7 mg dw y − 1 respectively. According to the LMM when age effect was omitted, the differences between disturbed and control locations in rhizome elongation and primary produ…
Diagnostic tools for GAMLSS fitted objects
In the last years GAMLSS models were applied in many research fields representing a good solution to analyze data with huge variabilty. In this paper we propose a new approach to diagnostics in GAMLSS as an alternative to classical worm plot. An application will be shown where the class of GAMLSS is applied in order to detect the presence of liver fibrosis as a function of patients risk factors.
Warming-related shifts in the distribution of two competing coastal wrasses
13 páginas, 5 figuras , 1 tabla, 1 apéndice con tres tablas y una figura
A recap on Linear Mixed Models and their hat-matrices
This working paper has a twofold goal. On one hand, it provides a recap of Linear Mixed Models (LMMs): far from trying to be exhaustive, this first part of the working paper focusses on the derivation of theoretical results on estimation of LMMs that are scattered in the literature or whose mathematical derivation is sometimes missing or too quickly sketched. On the other hand, it discusses various definitions that are available in the literature for the hat-matrix of Linear Mixed Models, showing their limitations and proving their equivalence.
Variazioni spaziali e temporali delle performance di crescita nelle praterie di Posidonia oceanica (L.) Delile: fattore endogeno vs. fattori esogeni
Dimethyl fumarate vs Teriflunomide: an Italian time-to-event data analysis
The introduction of oral disease-modifying therapies (DMTs) for relapsing-remitting multiple sclerosis (RRMS) changed the therapeutic landscape and algorithms of RRMS treatment (1). In Europe, dimethyl fumarate (DMF) and teriflunomide (TRF) are approved as first-line agents and are often used as the initial therapeutic choice (2, 3). Pivotal trials showed the efficacy of both DMTs on controlling clinical relapses, disability accrual and magnetic resonance imaging (MRI) activity (4-8). Both DMTs had overall good tolerability. There have been no head-to-head randomized trials to compare these two DMTs; however, several real-world evidence (RWE) studies have compared DMF and TRF and provided u…