0000000000287622

AUTHOR

Simona Buscemi

Comparing Boosting and Bagging for Decision Trees of Rankings

AbstractDecision tree learning is among the most popular and most traditional families of machine learning algorithms. While these techniques excel in being quite intuitive and interpretable, they also suffer from instability: small perturbations in the training data may result in big changes in the predictions. The so-called ensemble methods combine the output of multiple trees, which makes the decision more reliable and stable. They have been primarily applied to numeric prediction problems and to classification tasks. In the last years, some attempts to extend the ensemble methods to ordinal data can be found in the literature, but no concrete methodology has been provided for preference…

research product

Hierarchy of factors impacting grape berry mass: separation of direct and indirect effects on major berry metabolites

Final berry mass, a major quality factor in wine production, is determined by the integrated effect of biotic and abiotic factors that can also influence berry composition. Under field conditions, interactions between these factors complicate study of the variability of berry mass and composition. Depending on the observation scale, the hierarchy of the impact degree of these factors can vary. The present work examines the simultaneous effects of the major factors influencing berry mass and composition to create a hierarchy by impact degree. A second objective was to separate the possible direct effects of factors on berry composition from an indirect effect mediated through their impact on…

research product

Consensus measures for preference rankings with ties: an approach based on position weighted Kemeny distance

Preference data are a particular type of ranking data where some subjects (voters, judges, ...) give their preferences over a set of alternatives (items). It happens, in most of the real cases, that some items receive the same preference by a judge, giving raise to a ranking with ties. The purpose of our paper is to investigate on the consensus between rankings with ties taking into account the importance of swapping elements belonging to the top (or to the bottom) of the ordering (position weights). Combining the structure of the Taux proposed by Emond and Mason and the class of weighted Kemeny-Snell distances, we propose a position weighted rank correlation coefficient to compare rankings…

research product

A new position weight correlation coefficient for consensus ranking process without ties

Preference data represent a particular type of ranking data where a group of people gives their preferences over a set of alternatives. The traditional metrics between rankings do not take into account the importance of swapping elements similar among them (element weights) or elements belonging to the top (or to the bottom) of an ordering (position weights). Following the structure of the τx proposed by Emond and Mason and the class of weighted Kemeny–Snell distances, a proper rank correlation coefficient is defined for measuring the correlation among weighted position rankings without ties. The one‐to‐one correspondence between the weighted distance and the rank correlation coefficient ho…

research product

Consensus among preference rankings: a new weighted correlation coefficient for linear and weak orderings

AbstractPreference data are a particular type of ranking data where some subjects (voters, judges,...) express their preferences over a set of alternatives (items). In most real life cases, some items receive the same preference by a judge, thus giving rise to a ranking with ties. An important issue involving rankings concerns the aggregation of the preferences into a “consensus”. The purpose of this paper is to investigate the consensus between rankings with ties, taking into account the importance of swapping elements belonging to the top (or to the bottom) of the ordering (position weights). By combining the structure of $$\tau _x$$ τ x proposed by Emond and Mason (J Multi-Criteria Decis…

research product

Model selection in linear mixed-effect models

Linear mixed-effects models are a class of models widely used for analyzing different types of data: longitudinal, clustered and panel data. Many fields, in which a statistical methodology is required, involve the employment of linear mixed models, such as biology, chemistry, medicine, finance and so forth. One of the most important processes, in a statistical analysis, is given by model selection. Hence, since there are a large number of linear mixed model selection procedures available in the literature, a pressing issue is how to identify the best approach to adopt in a specific case. We outline mainly all approaches focusing on the part of the model subject to selection (fixed and/or ra…

research product

Ensemble methods for ranking data with and without position weights

The main goal of this Thesis is to build suitable Ensemble Methods for ranking data with weights assigned to the items’positions, in the cases of rankings with and without ties. The Thesis begins with the definition of a new rank correlation coefficient, able to take into account the importance of items’position. Inspired by the rank correlation coefficient, τ x , proposed by Emond and Mason (2002) for unweighted rankings and the weighted Kemeny distance proposed by García-Lapresta and Pérez-Román (2010), this work proposes τ x w , a new rank correlation coefficient corresponding to the weighted Kemeny distance. The new coefficient is analized analitically and empirically and represents the main…

research product