0000000000262619

AUTHOR

Franco Peracchi

Sampling properties of the Bayesian posterior mean with an application to WALS estimation

Many statistical and econometric learning methods rely on Bayesian ideas, often applied or reinterpreted in a frequentist setting. Two leading examples are shrinkage estimators and model averaging estimators, such as weighted-average least squares (WALS). In many instances, the accuracy of these learning methods in repeated samples is assessed using the variance of the posterior distribution of the parameters of interest given the data. This may be permissible when the sample size is large because, under the conditions of the Bernstein--von Mises theorem, the posterior variance agrees asymptotically with the frequentist variance. In finite samples, however, things are less clear. In this pa…

research product

Regression with imputed covariates: A generalized missing-indicator approach

A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper, we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may the…

research product

Estimating Engel curves under unit and item nonresponse

SUMMARY This paper estimates food Engel curves using data from the first wave of the Survey on Health, Aging and Retirement in Europe (SHARE). Our statistical model simultaneously takes into account selectivity due to unit and item nonresponse, endogeneity problems, and issues related to flexible specification of the relationship of interest. We estimate both parametric and semiparametric specifications of the model. The parametric specification assumes that the unobservables in the model follow a multivariate Gaussian distribution, while the semiparametric specification avoids distributional assumptions about the unobservables. Copyright © 2011 John Wiley & Sons, Ltd.

research product

Regression with Imputed Covariates: A Generalized Missing Indicator Approach

A common problem in applied regression analysis is that covariate values may be missing for some observations but imputed values may be available. This situation generates a trade-off between bias and precision: the complete cases are often disarmingly few, but replacing the missing observations with the imputed values to gain precision may lead to bias. In this paper we formalize this trade-off by showing that one can augment the regression model with a set of auxiliary variables so as to obtain, under weak assumptions about the imputations, the same unbiased estimator of the parameters of interest as complete-case analysis. Given this augmented model, the bias-precision trade-off may then…

research product

Comments on “Unobservable Selection and Coefficient Stability

Abstract–: We establish a link between the approaches proposed by Oster (2019) and Pei, Pischke, and Schwandt (2019) which contribute to the development of inferential procedures for causal effects in the challenging and empirically relevant situation where the unknown data-generation process is not included in the set of models considered by the investigator. We use the general misspecification framework recently proposed by De Luca, Magnus, and Peracchi (2018) to analyze and understand the implications of the restrictions imposed by the two approaches.

research product

A Generalized Missing-Indicator Approach to Regression with Imputed Covariates

We consider estimation of a linear regression model using data where some covariate values are missing but imputations are available to fill in the missing values. This situation generates a tradeoff between bias and precision when estimating the regression parameters of interest. Using only the subsample of complete observations does not cause bias but may imply a substantial loss of precision because the complete cases may be too few. On the other hand, filling in the missing values with imputations may cause bias. We provide the new Stata command gmi, which handles such tradeoff by using either model reduction or Bayesian model averaging techniques in the context of the generalized miss…

research product

Posterior moments and quantiles for the normal location model with Laplace prior

We derive explicit expressions for arbitrary moments and quantiles of the posterior distribution of the location parameter η in the normal location model with Laplace prior, and use the results to approximate the posterior distribution of sums of independent copies of η.

research product

Model averaging estimation of generalized linear models with imputed covariates

a b s t r a c t We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our…

research product

Ranking Scientific Journals Via Latent Class Models for Polytomous Item Response Data

Summary We propose a model-based strategy for ranking scientific journals starting from a set of observed bibliometric indicators that represent imperfect measures of the unobserved ‘value’ of a journal. After discretizing the available indicators, we estimate an extended latent class model for polytomous item response data and use the estimated model to cluster journals. We illustrate our approach by using the data from the Italian research evaluation exercise that was carried out for the period 2004–2010, focusing on the set of journals that are considered relevant for the subarea statistics and financial mathematics. Using four bibliometric indicators (IF, IF5, AIS and the h-index), some…

research product

Weighted-Average Least Squares (WALS): Confidence and Prediction Intervals

We extend the results of De Luca et al. (2021) to inference for linear regression models based on weighted-average least squares (WALS), a frequentist model averaging approach with a Bayesian flavor. We concentrate on inference about a single focus parameter, interpreted as the causal effect of a policy or intervention, in the presence of a potentially large number of auxiliary parameters representing the nuisance component of the model. In our Monte Carlo simulations we compare the performance of WALS with that of several competing estimators, including the unrestricted least-squares estimator (with all auxiliary regressors) and the restricted least-squares estimator (with no auxiliary reg…

research product

BALANCED VARIABLE ADDITION IN LINEAR MODELS

This paper studies what happens when we move from a short regression to a long regression in a setting where both regressions are subject to misspecification. In this setup, the least-squares estimator in the long regression may have larger inconsistency than the least-squares estimator in the short regression. We provide a simple interpretation for the comparison of the inconsistencies and study under which conditions the additional regressors in the long regression represent a “balanced addition” to the short regression.

research product

A Sample Selection Model for Unit and Item Nonresponse in Cross-Sectional Surveys

We consider a general sample selection model where unit and item nonresponse simultaneously affect a regression relationship of interest, and both types of nonresponse are potentially correlated. We estimate both parametric and semiparametric specifications of the model. The parametric specification assumes that the errors in the latent regression equations follow a trivariate Gaussian distribution. The semiparametric specification avoids distributional assumptions about the underlying regression errors. In our empirical application, we estimate Engel curves for consumption expenditure using data from the first wave of SHARE (Survey on Health, Aging and Retirement in Europe).

research product

Asymptotic properties of the weighted-average least squares (WALS) estimator

research product

On the ambiguous consequences of omitting variables

This paper studies what happens when we move from a short regression to a long regression (or vice versa), when the long regression is shorter than the data-generation process. In the special case where the long regression equals the data-generation process, the least-squares estimators have smaller bias (in fact zero bias) but larger variances in the long regression than in the short regression. But if the long regression is also misspecified, the bias may not be smaller. We provide bias and mean squared error comparisons and study the dependence of the differences on the misspecification parameter.

research product

Weighted-average least squares estimation of generalized linear models

The weighted-average least squares (WALS) approach, introduced by Magnus et al. (2010) in the context of Gaussian linear models, has been shown to enjoy important advantages over other strictly Bayesian and strictly frequentist model averaging estimators when accounting for problems of uncertainty in the choice of the regressors. In this paper we extend the WALS approach to deal with uncertainty about the specification of the linear predictor in the wider class of generalized linear models (GLMs). We study the large-sample properties of the WALS estimator for GLMs under a local misspecification framework that allows the development of asymptotic model averaging theory. We also investigate t…

research product

Comments on “Unobservable Selection and Coefficient Stability: Theory and Evidence” and “Poorly Measured Confounders are More Useful on the Left Than on the Right”

We establish a link between the approaches proposed by Oster (2019) and Pei, Pischke, and Schwandt (2019) which contribute to the development of inferential procedures for causal effects in the challenging and empirically relevant situation where the unknown data-generation process is not included in the set of models considered by the investigator. We use the general misspecification framework recently proposed by De Luca, Magnus, and Peracchi (2018) to analyze and understand the implications of the restrictions imposed by the two approaches.

research product