6533b821fe1ef96bd127ac20
RESEARCH PRODUCT
ei.Datasets: Real Data Sets for Assessing Ecological Inference Algorithms
Jose M. Pavíasubject
Split-ticket votingComputer scienceEcologyVotingmedia_common.quotation_subjectGeneral Social SciencesInferenceAggregate dataLibrary and Information SciencesLawComputer Science Applicationsmedia_commondescription
Ecological inference models aim to infer individual-level relationships using aggregate data. They are routinely used to estimate voter transitions between elections, disclose split-ticket voting behaviors, or infer racial voting patterns in U.S. elections. A large number of procedures have been proposed in the literature to solve these problems; therefore, an assessment and comparison of them are overdue. The secret ballot however makes this a difficult endeavor since real individual data are usually not accessible. The most recent work on ecological inference has assessed methods using a very small number of data sets with ground truth, combined with artificial, simulated data. This article dramatically increases the number of real instances by presenting a unique database (available in the R package ei.Datasets) composed of data from more than 550 elections where the true inner-cell values of the global cross-classification tables are known. The article describes how the data sets are organized, details the data curation and data wrangling processes performed, and analyses the main features characterizing the different data sets.
year | journal | country | edition | language |
---|---|---|---|---|
2021-09-06 | Social Science Computer Review |