0000000000458270

AUTHOR

Antonella Plaia

Element weighted Kemeny distance for ranking data

Preference data are a particular type of ranking data that arise when several individuals express their preferences over a finite set of items. Within this framework, the main issue concerns the aggregation of the preferences to identify a compromise or a “consensus”, defined as the closest ranking (i.e. with the minimum distance or maximum correlation) to the whole set of preferences. Many approaches have been proposed, but they are not sensitive to the importance of items: i.e. changing the rank of a highly-relevant element should result in a higher penalty than changing the rank of a negligible one. The goal of this paper is to investigate the consensus between rankings taking into accou…

research product

A Multisite-Multipollutant Air Quality Index

Abstract In this paper, starting from a multivariate spatio-temporal array, containing air pollution data collected for the main pollutants at different monitoring sites over a 1-year period, a new approach is proposed to get a Multipollutant-Multisite Air Quality Index (AQI) time series. A two steps aggregation, related to space and to pollutants, is considered. For the first aggregation (spatial synthesis) a PCA is performed on data array opportunely rearranged, while the index I2, proposed in Ruggieri and Plaia (2011) , is used for the second aggregation (pollutant synthesis), obtaining the new index I 2 MS . Daily data of four air pollutants from the city of Palermo (Italy) are analyzed…

research product

Influence diagnostics for generalized linear mixed models: a gradient-like statistic

In the literature, many influence measures proposed for Generalized Linear Mixed Models (GLMMs) require the information matrix that can be difficult to calculate. In the present paper, a known influence measure is approximated to get a simpler form, for which the information matrix is no more necessary. The proposed measure is showed to have a form similar to the gradient statistic, recently introduced. Good performances have been obtained through simulation studies.

research product

Robustness of air quality indicators: a study of PM10 levels in Scotland

research product

Association between the interleukin-1beta polymorphisms and Alzheimer's disease: a systematic review and meta-analysis.

Abstract The pro-inflammatory cytokine interleukin(IL)-1β is a main component in inflammatory pathways and is overexpressed in the brain of Alzheimer's disease (AD) patients. Several studies report associations between IL-1β polymorphisms and AD, but findings from different studies are controversial. Our aim was to verify the correlation between the single nucleotide polymorphisms (SNPs) of the IL-1β, at sites − 511 and + 3953, and AD by meta-analysis. Computerized bibliographic searches of PUBMED and AlzGene database ( http://www.alzgene.org ) were supplemented with manual searches of reference lists. There is evidence for association between IL-1β + 3953 SNP and AD, with an OR = 1.60 (95%…

research product

Long-term effects of contrasting tillage systems on soil C and N pools and on main microbial groups differ by crop sequence

Abstract Determining the best conservation agriculture practices for increasing soil organic carbon (C) and hence soil quality is of paramount importance in the semi-arid Mediterranean environment, where soils are experiencing a continuous decline in organic matter. Therefore, the aim of this long-term study was to assess the combined effects of tillage system and crop sequence on soil organic C and biochemical properties of soil generally used as indicators of soil quality. After 23 years of continuous application of contrasting tillage systems (conventional tillage [CT], vs. no tillage [NT]) and crop sequences (wheat monoculture vs. wheat-faba bean rotation), soil samples were collected f…

research product

The Performance of the Gradient-Like Influence Measure in Generalized Linear Mixed Models

A gradient-like statistic, recently introduced as an influence measure, has been proven to work well in large sample, thanks to its asymptotic properties. In this work, through small-scale simulation schemes, the performance of such a diagnostic measure is further investigated in terms of concordance with the main influence measures used for outlier identification. The simulation studies are performed by using generalized linear mixed models (GLMMs).

research product

Outlier detection to hierarchical and mixed effects models

Hierarchical and mixed effects models are models where a varying number of coefficients may be random at different levels of the hierarchy. The purpose of outlier analysis for these models is to determine whether an outlying unit at higher level is entirely outlying, or outlying due to effect of one or a few aberrant lower level units. Most works on diagnostics for these complex models have focused on the mixed model rather than on the hierarchical models, obscuring some relevant aspects of the hierarchical model. In this paper we will present an approach to influence analysis and outlier detection for mixed and hierarchical model, focusing on the special structure of nested data that these…

research product

The effect of allergen immunotherapy in the onset of new sensitizations: a meta-analysis

Background Although the preventive efficacy of allergen immunotherapy (AIT) in the onset of new allergen sensitizations has been asserted by many reviews, position papers, and consensus conferences, the evidence available is from only 3 studies. The objective of this work was a systematic review to evaluate the preventive efficacy of AIT in the onset of new allergen sensitizations. The end-point was the risk difference (RD) in the onset of new allergen sensitizations between patients treated with AIT and pharmacotherapy. Methods Computerized bibliographic searches of MEDLINE, EMBASE, and the Cochrane Library (until November 30th, 2016) were done. Random-effects and fixed-effects model meta-…

research product

Comparing Boosting and Bagging for Decision Trees of Rankings

AbstractDecision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference…

research product

Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches

Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …

research product

Filling in long gap sequences by performing jointly EOF and FDA

In this paper the EOF methodology is performed jointly with the FDA approach on a spatiotemporal multivariate data set with the aim to fill in missing values as accurately as possible when long gap sequences occur. Simulated data sets, containing ”artificial” gaps, are considered in order to test the performance of two proposed procedures; in the first one, observed data are reconstructed by EOF and then converted into functional ones; in the second one, observed data are transformed into functional ones and then EOF reconstruction is applied. By comparing some performance indicators computed for the two procedures, it is shown that a pre-processing of data by FDA, followed by the EOF, may …

research product

A stochastic imputation method for air quality spatio-temporal datasets with missing values

research product

Regression imputation for Space-Time datasets with missing values

Data consisting in repeated observations on a series of fixed units are very common in different context like biological, environmental and social sciences, and different terminology is often used to indicate this kind of data: panel data, longitudinal data, time series-cross section data (TSCS), spatio-temporal data. Missing information are inevitable in longitudinal studies, and can produce biased estimates and loss of powers. The aim of this paper is to propose a new regression (single) imputation method that, considering the particular structure and characteristics of the data set, creates a “complete” data set that can be analyzed by any researcher on different occasions and using diff…

research product

Conservation tillage in a semiarid Mediterranean environment: results of 20 years of research

Conservation tillage techniques are becoming increasingly popular worldwide as they have the potential to generate environmental, agronomic, and economic benefits. In Mediterranean areas, studies performed on the effects of conservation tillage [in comparison with the conventional tillage technique (CT)] on grain yield of cereal crops have reported contradictory results as well as considerable year-to-year variation, demonstrating how the impact of different soil tillage techniques on crop productivity is strongly site-specific. The present paper summarises the main results from a set of experiments carried out in Sicily during the last 20 years in which we compared no tillage (NT) to CT in…

research product

A family of distances for preference–approvals

Producción Científica

research product

Air quality and integration of short-term and long-term pollutant data

Modelling PM10 is an important problem in statistical methodology, above all to explain the PM10 behaviour in space and time, since it has been linked to many adverse effects on human and environmental health. But the large spatial variability of the main traffic-related pollutants, and in particular here the PM10, implies the impossibility of obtaining from the data of the fixed stations a complete pictures of the atmospheric pollution in the urban areas. Information from fixed monitoring stations (long-term measurements) are therefore integrated with the ones deriving from mobile station (short-term measurements). Short-term measurements are incomplete and so it is necessary to integrate …

research product

Ranking coherence in topic models using statistically validated networks

Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…

research product

Efficacy of subcutaneous and sublingual immunotherapy with grass allergens for seasonal allergic rhinitis: a meta-analysis–based comparison

Background: Subcutaneous (SCIT) and sublingual (SLIT) immunotherapy are the 2 most prescribed routes for administering allergen-specific immunotherapy. They were shown to be effective in control of symptoms and in reducing rescue medication use in patients with allergic diseases, but their effectiveness has to be balanced against side effects. In recent years, SLIT has been increasingly prescribed, instead of SCIT, because of improved safety and easy administration. Objective: We assessed which route is the most effective in the treatment of patients with seasonal allergic rhinitis to grass pollen. Methods: An indirect meta-analysis–based comparison between SCIT and SLIT was performed. Trea…

research product

Extending Functional kriging to a multivariate context

Environmental data usually have a spatio-temporal structure; pollutant concentrations, for example, are recorded along time and space. Generalized Additive Models (GAMs) represent a suitable tool to model spatial and/or temporal trends of this kind of data, that can be treated as functional, although they are collected as discrete observations. Frequently, the attention is focused on the prediction of a single pollutant at an unmonitored site and, at this aim, we extend kriging for functional data to a multivariate context by exploiting the correlation with the other pollutants. In particular, we propose two procedures: the first one (FKED) combines the regression of a variable (pollutant),…

research product

A comparison of ensemble algorithms for item-weighted Label Ranking

Label Ranking (LR) is a non-standard supervised classification method with the aim of ranking a finite collection of labels according to a set of predictor variables. Traditional LR models assume indifference among alternatives. However, misassigning the ranking position of a highly relevant label is frequently regarded as more severe than failing to predict a trivial label. Moreover, switching two similar alternatives should be considered less severe than switching two different ones. Therefore, efficient LR classifiers should be able to take into account the similarities and individual weights of the items to be ranked. The contribution of this paper is to formulate and compare flexible i…

research product

Weighted distance-based trees for ranking data

Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.

research product

Projection Clustering Unfolding: A New Algorithm for Clustering Individuals or Items in a Preference Matrix

In the framework of preference rankings, the interest can lie in clustering individuals or items in order to reduce the complexity of the preference space for an easier interpretation of collected data. The last years have seen a remarkable flowering of works about the use of decision tree for clustering preference vectors. As a matter of fact, decision trees are useful and intuitive, but they are very unstable: small perturbations bring big changes. This is the reason why it could be necessary to use more stable procedures in order to clustering ranking data. In this work, a Projection Clustering Unfolding (PCU) algorithm for preference data will be proposed in order to extract useful info…

research product

Hierarchy of factors impacting grape berry mass: separation of direct and indirect effects on major berry metabolites

Final berry mass, a major quality factor in wine production, is determined by the integrated effect of biotic and abiotic factors that can also influence berry composition. Under field conditions, interactions between these factors complicate study of the variability of berry mass and composition. Depending on the observation scale, the hierarchy of the impact degree of these factors can vary. The present work examines the simultaneous effects of the major factors influencing berry mass and composition to create a hierarchy by impact degree. A second objective was to separate the possible direct effects of factors on berry composition from an indirect effect mediated through their impact on…

research product

A gradient-based deletion diagnostic measure for generalized linear mixed models

ABSTRACTA gradient-statistic-based diagnostic measure is developed in the context of the generalized linear mixed models. Its performance is assessed by some real examples and simulation studies, in terms of ability in detecting influential data structures and of concordance with the most used influence measures.

research product

Multidimensional scaling and stock location assignment in a warehouse: an application

By means of an application, in the present paper, the suitability of a multivariate statistical methodology, as multidimensional scaling (MDS), to solve an optimization problem is shown. In particular, considering the stock location assignment problem in the warehouse of a supermarket chain, the solution gained by applying MDS to a set of seven variables is compared with the one obtainable by considering the usual techniques applied in this context. A wide discussion of results is reported. Copyright © 1999 John Wiley & Sons, Ltd.

research product

Le carriere universitarie degli studenti negli atenei statali e non statali in Italia

Negli ultimi anni si è assistito ad un incremento della competizione tra gli atenei per “accaparrarsi” gli studenti, a cui si aggiunge una sempre maggiore attività di promozione e di reclutamento degli studenti delle università non statali (telematiche e non). Le università non statali, altrimenti denominate “libere Università”, sono promosse sia da enti di diritto privato che da enti pubblici (regioni, province, comuni). Esse sono legalmente riconosciute dal Ministero dell'Istruzione dell'Università e della Ricerca, e autorizzate a rilasciare titoli accademici, relativi all’ordinamento universitario, di valore legale identico a quelli rilasciati dalle università statali. La letteratura ha …

research product

Indicators and measures for the assessment of university students’ careers

In the Italian University System the problem of student failure and of delaying the degree is causing increasing concern both for universities and for stakeholders. In this paper we compare the teaching performances of the italian universities, and individual cohort data of three Faculties of the University of Palermo, to extract some information useful to policy makers.

research product

Hierarchical linear models to analyze daily ambient PM10 in Palermo

research product

An aggregate air quality index considering interactions among pollutants

Several countries provide an Air Quality Index (AQI) to communicate air pollution, but there is not a unique and nternationally accepted methodology for constructing it. The most of the proposed indices are based on the USA AQI by EPA and are defined by the value of the pollutant with the highest concentration. For each pollutant, a sub-index is computed by linear interpolation according to the grid in a table, but the breakpoints of such a table may differ from one country to another, as well as the descriptors of each category, the air quality standards, the functions chosen as daily synthesis to aggregate hourly values at each site for each pollutant, and so on. Anyway the main drawback …

research product

Modeling confidential data via modified hurdle mixed models

research product

Modelling students' mobility in Italy: an analysis of the determinants by combining individual and aggregated data

The migratory phenomenon consisting in the enrolment of some students in a University different from the one nearest to their place of origin goes under the name of students’ mobility (SM). This phenomenon can be studied for different territorial aggregations, depending on the aims and the availability of data. In Italy several analyses have been carried out to detect the determinants of such a phenomenon. Two approaches are mostly followed. The former considers the analysis of aggregate data (AD) flows, and it is aimed at detecting the characteristics of the Universities and the territories in which they are situated, the latter analyzes the determinants of SM on the grounds of the student…

research product

Comparing air quality indices aggregated by pollutant

In this paper a new aggregate Air Quality Index (AQI) useful for describing the global air pollution situation for a given area is proposed. The index, unlike most of currently used AQIs, takes into account the combined effects of all the considered pollutants to human health. Its good performance, tested by means of a simulation plan, is confirmed by a comparison with two other indices proposed in the literature, one of which is based on the Relative Risk of daily mortality, considering an application to real data.

research product

Efficacy of Grass Pollen Allergen Sublingual Immunotherapy Tablets for Seasonal Allergic Rhinoconjunctivitis: A Systematic Review and Meta-analysis.

IMPORTANCE: Randomized clinical trials (RCTs) and meta-analyses of sublingual immunotherapy (SLIT) for the treatment of seasonal allergic rhinoconjunctivitis (SARC) have shown a modest clinical benefit compared with placebo. Furthermore, indirect comparison by meta-analyses showed that subcutaneous immunotherapy is more effective than SLIT. Despite these data, SLIT has become the most prescribed treatment of SARC in Europe in recent years, and it was approved by the US Food and Drug Administration for the treatment of SARC to grass pollen in the United States on April 1, 2014. OBJECTIVE: To assess the efficacy and safety of the grass pollen sublingual tablets licensed as drugs in the treatm…

research product

From a multivariate spatio-temporal array to a multipollutant - multisite Air Quality Index

AQIs are computed on air pollution data that are usually collected according to time, space and type of pollutant: in a given town/region, data consisting of hourly levels of K pollutants recorded in S monitoring sites, are usually organized in a three-mode array. A first aggregation step usually concerns time, and allows to pass from hourly data to a daily synthesis: in this paper data will be aggregated by time according to the guidelines provided by the national agencies producing the three mode array X. Here we will propose a new approach to get a Multipollutant-Multisite Air Quality Index time series from a multivariate spatio-temporal array. This implies a two step aggregation, accord…

research product

A new position weight correlation coefficient for consensus ranking process without ties

Preference data represent a particular type of ranking data where a group of people gives their preferences over a set of alternatives. The traditional metrics between rankings do not take into account the importance of swapping elements similar among them (element weights) or elements belonging to the top (or to the bottom) of an ordering (position weights). Following the structure of the τx proposed by Emond and Mason and the class of weighted Kemeny–Snell distances, a proper rank correlation coefficient is defined for measuring the correlation among weighted position rankings without ties. The one‐to‐one correspondence between the weighted distance and the rank correlation coefficient ho…

research product

Air quality assessment via functional principal component analysis

The knowledge of the global urban air quality situation represents the first step to face air pollution issues. For the last decades many urban areas can rely on a monitoring network, recording hourly data for the main pollutants. Such data need to be aggregated according to different dimensions, such as time, space and type of pollutant, in order to provide a synthetic air quality index which takes into account interactions among pollutants and correlation among monitoring sites.This paper focuses on Functional Principal Component techniques for the statistical analysis of a set of environmental data x(spt), where s stands for the monitoring site, p for the pollutant and t for time, usuall…

research product

Personalized cost-effectiveness of boceprevir-based triple therapy for untreated patients with genotype 1 chronic hepatitis C

Abstract Background We assessed the cost-effectiveness of boceprevir-based triple therapy compared to peginterferon alpha and ribavirin dual therapy in untreated patients with genotype 1 chronic hepatitis C; patients were discriminated according to the combination of baseline plus on-treatment predictors of boceprevir-based triple therapy. Methods Cost-effectiveness analysis performed according to data from the available published literature. The target population was composed of untreated Caucasian patients, aged 50 years, with genotype 1 chronic hepatitis C, and these were evaluated over a lifetime horizon by Markov model. The study was carried out from the perspective of the Italian Nati…

research product

Functional Principal Component Analysis for the explorative analysis of multisite-multivariate air pollution time series with long gaps

The knowledge of the urban air quality represents the first step to face air pollution issues. For the last decades many cities can rely on a network of monitoring stations recording concentration values for the main pollutants. This paper focuses on functional principal component analysis (FPCA) to investigate multiple pollutant datasets measured over time at multiple sites within a given urban area. Our purpose is to extend what has been proposed in the literature to data that are multisite and multivariate at the same time. The approach results to be effective to highlight some relevant statistical features of the time series, giving the opportunity to identify significant pollutants and…

research product

Causal Models for Monitoring University Ordinary Financing Fund

Recently iterated decreasing government transfers and an increasing proportion of budget allotted basing on competitive performances, took Italian Universities started struggling with competition for funds, in particular for the University Ordinary Financing Fund (FFO). Aim of this paper is monitoring variables responsible for FFO indicators, where monitoring means: describing, analysing retrospectively, predicting and intervening on variables responsible for indicators. All this aims can be achieved by statistical techniques that should be theoretically equipped with the distinction between predicting under observation and predicting under intervention, in order to provide correct answers …

research product

HLA and killer cell immunoglobulin-like receptors influence the natural course of CMV infection.

Background. Natural killer (NK) cells provide a major defense against cytomegalovirus (CMV) infection through the interaction of their surface receptors, including the activating and inhibitory killer immunoglobulinlike receptors (KIRs), and human leukocyte antigens (HLA) class I molecules. This study assessed whether the KIR and HLA repertoire may influence the risk of developing symptomatic or asymptomatic disease after primary CMV infection in the immunocompetent host. Methods. Sixty immunocompetent patients with primary symptomatic CMV infection were genotyped for KIR and their HLA ligands, along with 60 subjects with a previous asymptomatic infection as controls. Results. The frequency…

research product

Statistical Multivariate Techniques for the Stock Location Assignment Problem

In previous papers we proposed to apply multivariate statistical methodologies, like Multidimensional Scaling (MDS) and Seriation to the stock location assignment problem of a warehouse, often solved by considering the Cube per Order Index (COI). In this paper we compare the results by MDS, Seriation, a COI based method and the Maximum Path criterion, considering the data of a whole year of a Sicilian supermarket chain warehouse. The comparison is based on the simulated times to satisfy a sample of real orders.

research product

A New Approach to the Stock Location Assignment Problem by Multidimensional Scaling and Seriation

The problem of the best stock location assignment in a warehouse has a fundamental role while optimising picking activities. In the present paper, this problem has been faced by considering seven variables to compute similarity between items. In this context, the problem of the choice of the most adequate similarity (or dissimilarity) measure between units while applying Multidimensional Scaling (MDS), has been examined. Besides the right metric, the possibility of applying a Seriation algorithm has been also considered. By using both MDS and seriation not just a single target can be considered, but we are able to manage with a plenty of variables; on the contrary with techniques used in li…

research product

Knowledge of the literature is crucial for meta-analyses

research product

Tecniche conservative di gestione del suolo in ambiente mediterraneo: risultati di un ventennio di sperimentazioni.

research product

Boosting for ranking data: an extension to item weighting

Gli alberi decisionali sono una tecnica predittiva di machine learning particolarmente diffusa, utilizzata per prevedere delle variabili discrete (classificazione) o continue (regressione). Gli algoritmi alla base di queste tecniche sono intuitivi e interpretabili, ma anche instabili. Infatti, per rendere la classificazione più affidabile si `e soliti combinare l’output di più alberi. In letteratura, sono stati proposti diversi approcci per classificare ranking data attraverso gli alberi decisionali, ma nessuno di questi tiene conto ne dell’importanza, ne delle somiglianza dei singoli elementi di ogni ranking. L’obiettivo di questo articolo `e di proporre un’estensione ponderata del metodo …

research product

Consensus among preference rankings: a new weighted correlation coefficient for linear and weak orderings

AbstractPreference data are a particular type of ranking data where some subjects (voters, judges,...) express their preferences over a set of alternatives (items). In most real life cases, some items receive the same preference by a judge, thus giving rise to a ranking with ties. An important issue involving rankings concerns the aggregation of the preferences into a “consensus”. The purpose of this paper is to investigate the consensus between rankings with ties, taking into account the importance of swapping elements belonging to the top (or to the bottom) of the ordering (position weights). By combining the structure of $$\tau _x$$ τ x proposed by Emond and Mason (J Multi-Criteria Decis…

research product

Efficacy of sublingual immunotherapy with grass allergens for seasonal allergic rhinitis: A systematic review and meta-analysis

Background The benefit of sublingual immunotherapy (SLIT) with grass allergens for seasonal allergic rhinitis has been extensively studied, but data on efficacy are still equivocal. Objective To assess the effectiveness of SLIT with grass allergens in the reduction of symptoms and medication in patients with seasonal allergic rhinitis to grass pollen. Methods Computerized bibliographic searches of MEDLINE (1995-2010) were supplemented by hand searches of reference lists. Studies were included if they were double-blind randomized controlled trials (RCTs) comparing SLIT to placebo and if they included patients with history of allergy to grass pollen treated with natural grass pollen extracts.…

research product

Urban PM10 air quality indicator sensitivity

research product

The effect of allergen immunotherapy in the onset of new sensitizations: a meta-analysis.

Background Although the preventive efficacy of allergen immunotherapy (AIT) in the onset of new allergen sensitizations has been asserted by many reviews, position papers, and consensus conferences, the evidence available is from only 3 studies. The objective of this work was a systematic review to evaluate the preventive efficacy of AIT in the onset of new allergen sensitizations. The end-point was the risk difference (RD) in the onset of new allergen sensitizations between patients treated with AIT and pharmacotherapy. Methods Computerized bibliographic searches of MEDLINE, EMBASE, and the Cochrane Library (until November 30th, 2016) were done. Random-effects and fixed-effects model meta-…

research product

Ensemble methods for item-weighted label ranking: a comparison

Label Ranking (LR), an emerging non-standard supervised classification problem, aims at training preference models that order a finite set of labels based on a set of predictor features. Traditional LR models regard all labels as equally important. However, in many cases, failing to predict the ranking position of a highly relevant label can be considered more severe than failing to predict a trivial one. Moreover, an efficient LR classifier should be able to take into account the similarity between the items to be ranked. Indeed, swapping two similar elements should be less penalized than swapping two dissimilar ones. The contribution of the present paper is to formulate more flexible item…

research product

Totally laparoscopic liver resections for primary and metastatic cancer in the elderly: safety, feasibility and short-term outcomes.

Standard oncologic liver resections performed on elderly patients (≥70 years old) have been shown to be safe and effective. The aim of this study was to analyze operative and oncologic short-term outcomes of totally laparoscopic liver resections (TLLR) performed on elderly patients for malignancies. We performed a retrospective statistical analysis of prospectively recorded data of TLLR performed from October 2008 to February 2012 by a single hepato-pancreato-biliary (HPB) surgeon. Patients were divided into two groups according to age (<70 vs. ≥70 years old) and perioperative outcomes were compared. A total of 60 TLLR for malignancies were identified of which 25 patients (42 %) were aged ≥…

research product

Functional principal component analysis of quantile curves

Literature on functional data analysis is mainly focused on estimation of individuals curves and characterization of average dynamics. The idea underlying this proposal is to focus attention on other particular features of the distribution of the observed data, moving from mean functions towards functional quantiles. The motivating examples are functional data sets that are collections of high frequency data recorded along time. As quantiles provide information on various aspects of a time series, we propose a modelling framework for the joint estimation of functional quantiles, varying along time, and functional principal components, summarizing some common dynamics shared by the functiona…

research product

Efficacy of allergen immunotherapy in reducing the likelihood of developing new allergen sensitizations: a systematic review

Background Guidelines and position papers indicate that allergen immunotherapy (AIT) is the only disease-modifying treatment, including prevention of the onset of new allergen sensitizations. However, this preventive effect was shown by only a few observational studies. Our aim was to systematically review the efficacy of AIT in preventing the onset of new allergen sensitizations. Methods Computerized bibliographic searches of MEDLINE, EMBASE, and the Cochrane Library (through June 2015) were supplemented with manual searches of reference lists. Observational studies or randomized controlled trials with a long-term observation period were included. Paired reviewers extracted data about stud…

research product

Air quality indices: a review

National directives on air quality oblige nations to monitor and report on their air quality, allowing the public to be informed on the ambient pollution levels. The last is the reason for the always increasing interest, demonstrated by the number of publications on this topic in recent years, in air quality/pollution indices: since the concentration of individual pollutants can be confusing, concentration measurements are conveniently transformed in terms of an air quality index. In this way, complex situations are summarized in a single figure, letting comparisons in time and space be possible. In this paper we will give an overview about the Air Quality/Pollution Indices proposed in lite…

research product

Application and assessment of IDEF3‐process flow description capture method

The IDEF techniques have been developed in projects sponsored by the US Air Force in order to describe, specify and model manufacturing systems in a structured graphical form. These techniques can be classified in two categories: the “modelling” and the “descriptive” varieties. Compares two IDEF methods (one of the modelling type and one of the descriptive type) in order to represent (model or describe) two different aspects of an industrial organization. The methods compared are IDEF0, function modelling method, and IDEF3, process flow description capture method. Concludes that when considering the sequencing of the activities in process, aiming at highlighting their eventual simultaneity,…

research product

Impact of the COVID-19 pandemic on music: a method for clustering sentiments

The outbreak of coronavirus disease 2019 (COVID-19) was highly stressful for people. In general, fear and anxiety about a disease can be overwhelming and cause strong emotions in adults and children. One way to cope with this stress consists in listening to music. Aim of this work is to understand if the music heard during the lock-down reflects the emotions generated by the pandemic on each of us. So, the primary goal of this work is to build two indices for measuring the anger and joy levels of the top streamed songs by Italian Spotify users (during the SARS-CoV-2 pandemic), and study their evolution over time. A Hierarchical Cluster Analysis has been applied in order to identify groups o…

research product

Classification trees for multivariate ordinal response: an application to Student Evaluation Teaching

Data from multiple items on an ordinal scale are commonly collected when qualitative variables, such as feelings, attitudes and many other behavioral and health-related variables are observed. In this paper we introduce a method to derive a distance-based tree for multivariate ordinal response that allows, when subject-specific characteristics are available, to derive common profiles for respondents giving the same/similar multivariate ratings. Special attention will be paid to the performance comparison in terms of AUC, for three different distances used as splitting criteria. Simulated data an a dataset from a Student Evaluation of Teaching survey will be used as illustrative examples. Th…

research product

Long gaps in multivariate spatio-temporal data: an approach based on functional data analysis

The main aim of this paper is to perform Functional Principal Component Analysis (FPCA) taking into account spatio-temporal correlation structures, in order to fill in missing values in spatio-temporal multivariate data set. A spatial and a spatio-temporal variant of the classical temporal FPCA is considered; in other words, FPCA is carried out after modeling data with respect to more than one dimension: space (long, lat) or space+time. Moreover, multidimensional FPCA is extended to multivariate context (more than one variable). Information on spatial or spatiotemporal structures are efficiently extracted by applying Generalized Additive Models (GAMs). Both simulation studies and some perfo…

research product

Functional principal component analysis for multivariate multidimensional environmental data

Data with spatio-temporal structure can arise in many contexts, therefore a considerable interest in modelling these data has been generated, but the complexity of spatio-temporal models, together with the size of the dataset, results in a challenging task. The modelization is even more complex in presence of multivariate data. Since some modelling problems are more natural to think through in functional terms, even if only a finite number of observations is available, treating the data as functional can be useful (Berrendero et al. in Comput Stat Data Anal 55:2619–2634, 2011). Although in Ramsay and Silverman (Functional data analysis, 2nd edn. Springer, New York, 2005) the case of multiva…

research product

Reply to

research product

A new OLS-based procedure for clusterwise linear regression

Data heterogeneity, within a (linear) regression framework, often suggests the use of a Clusterwise Linear Regression (CLR) procedure, which implies, among other things, the estimate of the appropriate number of clusters as well as the cluster membership of each unit. The approaches to the estimation of a CLR model are essentially based on the Ordinary Least Square (OLS) criterion or the likelihood criterion. In this paper, in a context of OLS approach, we propose an estimation of the model making use of an algorithm based on a threshold criterion for the determination coefficient of each cluster, to identify the appropriate number of clusters, and of a modified Spath's algorithm, to estima…

research product

Missing value imputation methods for multilevel data

research product

Towards the definition of distance measures in the preference-approval structures

The task of combining preference rankings and approval voting is a relevant issue in social choice theory. The preference-approval voting (PAV) analyses the preferences of a group of individuals over a set of items. The main difference with the classical approaches for preference data consists in introducing, in addition to the ranking of candidates, a further distinction; candidates are subsetted in “acceptable” and “unacceptable”, or also in “good set” and “bad set” (a way to express the approval/disapproval). This work introduces the definition of a new measure to quantify disagreement between preference-approval profiles. For each pair of alternatives, we consider the two possible disag…

research product

Single imputation method of missing values in environmental pollution data sets

Abstract Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM 10 concentration measured every 2 h by eight monitoring stations distributed over the metropolitan area of Paler…

research product

Spatial misaligned data in environmental processes

research product

Model selection in linear mixed-effect models

Linear mixed-effects models are a class of models widely used for analyzing different types of data: longitudinal, clustered and panel data. Many fields, in which a statistical methodology is required, involve the employment of linear mixed models, such as biology, chemistry, medicine, finance and so forth. One of the most important processes, in a statistical analysis, is given by model selection. Hence, since there are a large number of linear mixed model selection procedures available in the literature, a pressing issue is how to identify the best approach to adopt in a specific case. We outline mainly all approaches focusing on the part of the model subject to selection (fixed and/or ra…

research product

Short-term and long-term pollutant data: an integration for decision policy makers

research product

Influence Diagnostics for Meta-Analysis of Individual Patient Data Using Generalized Linear Mixed Models

In meta-analysis, generalized linear mixed models (GLMMs) are usually used when heterogeneity is present and individual patient data (IPD) are available, while accepting binary, discrete as well as continuous response variables. In the present paper some measures of influence diagnostics based on log-likelihood are suggested and discussed. A known measure is approximated to get a simpler form, for which the information matrix is no more necessary. The performance of the proposed measure is assessed through a diagnostic analysis on simulated data reproducing a possible meta-analytical context of IPD with influential outliers. The proposed measure is showed to work well and to have a form sim…

research product

Recursive partitioning: an approach based on the weighted kemeny distance

In the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures. The possibility to derive consensus measures using a classification tree represents a novelty and an important tool, given its easy interpretability. This work proposes the use of a univariate decision tree for ranking data based on the weighted Kemeny distance. The performance of the methodology will be shown by using a real dataset about university rankings.

research product

Aggregate air pollution indices: a new proposal

A new aggregate Air Quality Index (I2) to represent the global air pollution situation for a given city/region is proposed. Accounting for simultaneous exposure to common pollutants and their effects on human health, this index overcomes existing AQIs. Its goodness and utility is shown by a simulation plan and by an application to a real dataset on main pollutants.

research product

Missing Data in Space-time: Long Gaps Imputation Based On Functional Data Analysis

High dimensional data with spatio-temporal structures are of great interest in many elds of research, but their exhibited complexity leads to practical issues when formulating statistical models. Functional data analysis through smoothing methods is a proper framework for incorporating space-time structures: extending the basic methodology to the multivariate spatio-temporal setting, we refer to Generalized Additive Models for estimating functional data taking the spatial and temporal dependences into account, and to Functional Principal Component Analysis as a classical dimension reduction technique to cope with the high dimensionality and with the number of estimated eects. Since spatial …

research product

Cost-effectiveness of sofosbuvir-based triple therapy for untreated patients with genotype 1 chronic hepatitis C

We assessed the cost-effectiveness of sofosbuvir (SOF)-based triple therapy (TT) compared with boceprevir (BOC)- and telaprevir (TVR)-based TT in untreated genotype 1 (G1) chronic hepatitis C (CHC) patients discriminated according to IL28B genotype, severity of liver fibrosis, and G1 subtype. The available published literature provided the data source. The target population was made up of untreated Caucasian patients, aged 50 years, with G1CHC and these were evaluated over a lifetime horizon by Markov model. The study was carried out from the perspective of the Italian National Health Service. Outcomes included discounted costs (in euros at 2013 value), life-years gained (LYG), quality-adju…

research product

Constrained Clusterwise Linear Regression

In market segmentation, Conjoint Analysis is often used to estimate the importance of a product attributes at the level of each single customer, clustering, successively, the customers whose behavior can be considered similar. The preference model parameter estimation is made considering data (usually opinions) of a single customer at a time, but these data are usually very few as each customer is called to express his opinion about a small number of different products (in order to simplify his/her work). In the present paper a Constrained Clusterwise Linear Regression algorithm is presented, that allows simultaneously to estimate parameters and to cluster customers, using, for the estimati…

research product

Long-term experiments and strip plot designs

In a long-term experiment usually the experimenter needs to know whether the effect of a treatment varies over time. But time usually has both a fixed and a random effects over the output and the difficulty in the analysis depends on the particular design considered and the availability of covariates. Actually, as shown in the paper, the presence of covariates can be very useful to model the random effect of time. In this paper a model to analyze data from a long-term strip plot design with covariates is proposed. Its effectiveness will be tested using both simulated and real data from a crop rotation experiment.

research product

An informal procedure to detect outliers in multilevel models

research product

Wirksamkeit von sublingualen Gräserpollentabletten bei saisonaler allergischer Rhinitis – eine systematische Übersicht und Metaanalyse

immunology

research product

Principal components for multivariate spatiotemporal functional data

Multivariate spatio-temporal data consist of a three way array with two dimensions’ domains both structured, temporally and spatially; think for example to a set of different pollutant levels recorded for a month/year at different sites. In this kind of dataset we can recognize time series along one dimension, spatial series along another and multivariate data along the third dimension. Statistical techniques aiming at handling huge amounts of information are very important in this context and classical dimension reduction techniques, such as Principal Components, are relevant, allowing to compress the information without much loss. Although time series, as well as spatial series, are recor…

research product

Weed seedbank size and composition in a long-term tillage and crop sequence experiment

Summary Knowledge of the effects of agricultural practices on weed seedbank dynamics is essential for predicting future problems in weed management. This article reports data relative to weed seedbank structure after 18 years of continuous application of conventional tillage (CT, based on mouldboard ploughing) or no tillage (NT) within three crop sequences (continuous wheat, WW; wheat–faba bean, WF; and wheat–berseem clover, WB). Tillage system did not affect the size of the total weed seedbank, but altered both its composition and the distribution of seeds within the soil profile. In particular, the adoption of CT favoured some species (mainly Polygonum aviculare), whereas the continuous u…

research product

Weather variables and air pollution via hierarchical linear models

research product

Can the Students’ Career be Helpful in Predicting an Increase in Universities Income?

The students’ academic failure and the delay in obtaining their final degree are a significant issue for the Italian universities and their stakeholders. Based on indicators proposed by the Italian Ministry of University, the Italian universities are awarded a financial incentive if they reduce the students’ attrition and failure. In this paper we analyze the students’ careers performance using: (1) aggregate data; (2) individual data. The first compares the performances of the Italian universities using the measures and the indicators proposed by the Ministry. The second analyzes the students’ careers through an indicator based on credit earned by each student in seven academic years. The …

research product

Long-term tillage and crop sequence effects on wheat grain yield and quality.

Much research around the world has compared the performance of cereals grown under conventional and conservation tillage systems; however, relatively few long-term experiments have been conducted in Mediterranean areas, and little attention has been given to interactions among tillage techniques and other system components across space and time. In this study, we investigated the effects of the long-term (18-yr) use of three tillage techniques (conventional tillage, CT; reduced tillage, RT; and no-till, NT) on wheat (Triticum durum Desf.) grain yield and quality within three crop sequences: continuous wheat, faba bean (Vicia faba L.)–wheat, and berseem clover (Trifolium alexandrinum L.)–whe…

research product

ENSEMBLE METHODS FOR RANKING DATA

The last years have seen a remarkable flowering of works about the use of decision trees for ranking data. As a matter of fact, decision trees are useful and intuitive, but they are very unstable: small perturbations bring big changes. This is the reason why it could be necessary to use more stable procedures, as ensemble methods, in order to find which predictors are able to explain the preference structure. In this work ensemble methods as BAGGING and Random Forest are proposed, from both a theoretical and computational point of view, for deriving classification trees when ranking data are observed. The advantages of these procedures are shown through an example on the SUSHI data set.

research product

Imputation of missing values in air quality data sets

research product

Comparing Spatial and Spatio-temporal FPCA to Impute Large Continuous Gaps in Space

Multivariate spatio-temporal data analysis methods usually assume fairly complete data, while a number of gaps often occur along time or in space. In air quality data long gaps may be due to instrument malfunctions; moreover, not all the pollutants of interest are measured in all the monitoring stations of a network. In literature, many statistical methods have been proposed for imputing short sequences of missing values, but most of them are not valid when the fraction of missing values is high. Furthermore, the limitation of the methods commonly used consists in exploiting temporal only, or spatial only, correlation of the data. The objective of this paper is to provide an approach based …

research product

Diagnostics for meta-analysis based on generalized linear mixed models

Meta-analysis is the method to combine data coming from multiple studies, with the aim to provide an overall event-risk measure of interest summarizing information coming from the studies. In meta-analysis generalized linear mixed models (GLMM) are particularly used for a number of measures of interest since they allow the true effect size to differ from study to study while accepting binary, discrete as well as continuous response variable. In the present paper some strategies of influence diagnostics based on log-likelihood are suggested and discussed. These are considered for Individual Patient Data, Aggregate Data and their compounding.

research product

Cost-effectiveness of sofosbuvir-based triple therapy for untreated patients with genotype 1 chronic hepatitis C

We assessed the cost-effectiveness of sofosbuvir (SOF)-based triple therapy (TT) compared with boceprevir (BOC)- and telaprevir (TVR)-based TT in untreated genotype 1 (G1) chronic hepatitis C (CHC) patients discriminated according to IL28B genotype, severity of liver fibrosis, and G1 subtype. The available published literature provided the data source. The target population was made up of untreated Caucasian patients, aged 50 years, with G1CHC and these were evaluated over a lifetime horizon by Markov model. The study was carried out from the perspective of the Italian National Health Service. Outcomes included discounted costs (in euros at 2013 value), life-years gained (LYG), quality-adju…

research product

Analyzing the effect of meteorologiacal variables on daily average PM10 in Palermo by HLM

research product

Environmental misaligned data via HLM

research product

A graphical model selection tool for mixed models

Model selection can be defined as the task of estimating the performance of different models in order to choose the most parsimonious one, among a potentially very large set of candidate statistical models. We propose a graphical representation to be considered as an extension to the class of mixed models of the deviance plot proposed in the literature within the framework of classical and generalized linear models. This graphical representation allows, once a reduced number of models have been selected, to identify important covariates focusing only on the fixed effects component, assuming the random part properly specified. Nevertheless, we suggest also a standalone figure representing th…

research product

Switching from conventional tillage to no-tillage: Soil N availability, N uptake,15N fertilizer recovery, and grain yield of durum wheat

Abstract This 2-year study, performed in a typical Mediterranean environment on three soil types (two Inceptisols and one Vertisol), aimed to improve understanding of the factors that play a major role in determining crop response when soil management shifts from conventional tillage (CT) to no-tillage (NT). The effects of NT on the soil nitrogen (N) availability, N uptake, 15N fertilizer recovery, and grain yield of durum wheat were evaluated in comparison to CT under five different N fertilization rates (0, 40, 80, 120, and 160 kg N ha−1). Compared to CT, NT negatively affected grain yield in one of the two years but only in the two Inceptisols. On average, a considerable grain yield adva…

research product

Statistically Validated Networks for evaluating coherence in topic models

Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be characterized by a set of irrelevant or unchained words, being useless for the interpretation. In the framework of topic quality evaluation, the pairwise semantic cohesion among the top-N most pr…

research product

Empirical Orthogonal Function and Functional Data Analysis Procedures to Impute Long Gaps in Environmental Data

Air pollution data sets are usually spatio-temporal multivariate data related to time series of different pollutants recorded by a monitoring network. To improve the estimate of functional data when missing values, and mainly long gaps, are present in the original data set, some procedures are here proposed considering jointly Functional Data Analysis and Empirical Orthogonal Function approaches. In order to compare and validate the proposed procedures, a simulation plan is carried out and some performance indicators are computed. The obtained results show that one of the proposed procedures works better than the others, providing a better reconstruction especially in presence of long gaps.

research product

Can the students' career performance be helpful in predicting an increase in universities income?

The students’ academic failure and the delay in obtaining their final degree are a significant issue for the Italian universities and their shareholders. Based on indicators proposed by the Italian Ministry of University, the Italian universities are awarded a financial incentive if they reduce the students’ attrition and failure. In this paper we analyze the students’ careers performance using: 1) aggregate data; 2) individual data. The first compares the performances of the Italian universities using the measures and the indicators proposed by the Ministry. The second analyzes the students’ careers through an indicator based on credit earned by each student in seven academic years. The pr…

research product

The pblm package: semiparametric regression for bivariate categorical responses in R

We present an R package to fit semiparametric regression models for two categorical responses. It works for both nominal and ordered responses and several types of logits can be specified. Proportional, non-proportional and partial proportional odds models can be fitted, with marginal and association parameters estimated in a parametric or semiparametric way, via penalized maximum likelihood estimation. An application to show the potential of the package is carried out on a data set of Italian university students.

research product

A weighted distance-based approach with boosted decision trees for label ranking

Label Ranking (LR) is an emerging non-standard supervised classification problem with practical applications in different research fields. The Label Ranking task aims at building preference models that learn to order a finite set of labels based on a set of predictor features. One of the most successful approaches to tackling the LR problem consists of using decision tree ensemble models, such as bagging, random forest, and boosting. However, these approaches, coming from the classical unweighted rank correlation measures, are not sensitive to label importance. Nevertheless, in many settings, failing to predict the ranking position of a highly relevant label should be considered more seriou…

research product

An aggregate AQI: comparing different standardizations and introducing a variability index

Many studies demonstrate a strong relationship between air pollution and respiratory and cardiovascular diseases. For this reason, assessing air pollution, and conveying information about its possible adverse health effects, may encourage population and policy makers to reduce those activities increasing pollution levels. In this paper a relative index of variability, to be associated with the aggregate Air Quality Index (AQI) among pollutants proposed by Ruggieri and Plaia (2011), is developed in order to better investigate air pollution conditions for the whole area of a city/region. The most widely-used and up to date pollution indices, based mainly on AQI computed by the US Environmenta…

research product

Classification trees for preference data: a distance-based approach

In the framework of preference rankings, when the interest lies in explaining which predictors and which interactions among predictors are able to explain the observed preference structures, the possibility to derive consensus measures using a classi cation tree represents a novelty and an important tool given its easy interpretability. In this work we propose the use of a multivariate decision tree where a weighted Kemeny distance is used both to evaluate the distances between rankings and to de ne an impurity measure to be used in the recursive partitioning. The proposed approach allows also to weight di erently high distances in rankings in the top and in the bottom alternatives.

research product

Comparing FPCA Based on Conditional Quantile Functions and FPCA Based on Conditional Mean Function

In this work functional principal component analysis (FPCA) based on quantile functions is proposed as an alternative to the classical approach, based on the functional mean. Quantile regression characterizes the conditional distribution of a response variable and, in particular, some features like the tails behavior; smoothing splines have also been usefully applied to quantile regression to allow for a more flexible modelling. This framework finds application in contexts involving multiple high frequency time series, for which the functional data analysis (FDA) approach is a natural choice. Quantile regression is then extended to the estimation of functional quantiles and our proposal exp…

research product

Nitrogen uptake and nitrogen fertilizer recovery in old and modern wheat genotypes grown in the presence or absence of interspecific competition

Choosing genotypes with a high capacity for taking up nitrogen (N) from the soil and the ability to efficiently compete with weeds for this nutrient is essential to increasing the sustainability of cropping systems that are less dependent on auxiliary inputs. This research aimed to verify whether differences exist in N uptake and N fertilizer recovery capacity among wheat genotypes and, if so, whether these differences are related to a different competitive ability against weeds of wheat genotypes. To this end, 12 genotypes, varying widely in morphological traits and year of release, were grown in the presence or absence of interspecific competition (using Avena sativa L. as a surrogate wee…

research product

EOFs for gap filling in multivariate air quality data: a FDA approach

Missing values are a common concern in spatiotemporal data sets. During recent years a great number of methods have been developed for gap filling. One of the emerging approaches is based on the Empirical Orthogonal Function (EOF) methodology, applied mainly on raw and univariate data sets presenting irregular missing patterns. In this paper EOF is carried out on a multivariate space-time data set, related to concentrations of pollutants recorded at different sites, after denoising raw data by FDA approach. Some performance indicators are computed on simulated incomplete data sets with also long gaps in order to show that the EOF reconstruction appears to be an improved procedure especially…

research product

GAMs and functional kriging for air quality data

Data having spatio-temporal structure are often observed in environmental sciences. They may be considered as discrete observations from curves along time and/or space and treated as functional. Generalized Additive Models (GAMs) represent a useful tool for modelling, for example, as pollutant concentrations describing their spatial and/or temporal trends.Usually, the prediction of a curve at an unmonitored site is necessary and, with this aim, we extend kriging for functional data to a multivariate context. Moreover, even if we are interested only in predicting a single pollutant, such as PM10, the estimation can be improved exploiting its correlation with the other pollutants. Cross valid…

research product