0000000000262617

AUTHOR

Giuseppe De Luca

0000-0002-1411-2543

Sampling properties of the Bayesian posterior mean with an application to WALS estimation

Many statistical and econometric learning methods rely on Bayesian ideas, often applied or reinterpreted in a frequentist setting. Two leading examples are shrinkage estimators and model averaging estimators, such as weighted-average least squares (WALS). In many instances, the accuracy of these learning methods in repeated samples is assessed using the variance of the posterior distribution of the parameters of interest given the data. This may be permissible when the sample size is large because, under the conditions of the Bernstein--von Mises theorem, the posterior variance agrees asymptotically with the frequentist variance. In finite samples, however, things are less clear. In this pa…

research product

Sampling design and weighting strategies in SHARE Wave 6

research product

Estimating Engel curves under unit and item nonresponse

SUMMARY This paper estimates food Engel curves using data from the first wave of the Survey on Health, Aging and Retirement in Europe (SHARE). Our statistical model simultaneously takes into account selectivity due to unit and item nonresponse, endogeneity problems, and issues related to flexible specification of the relationship of interest. We estimate both parametric and semiparametric specifications of the model. The parametric specification assumes that the unobservables in the model follow a multivariate Gaussian distribution, while the semiparametric specification avoids distributional assumptions about the unobservables. Copyright © 2011 John Wiley & Sons, Ltd.

research product

Sampling Design in SHARE Wave 8 and Recruitment of Refreshment Samples until the Suspension of Fieldwork

The aim of the SHARE survey design is to be able to draw inferences about the population of people aged 50 years or older across countries by using probability-based sampling. This chapter documents the sampling design adopted in the eighth wave of SHARE that had to be suspended due to COVID-19 in March 2020. Starting with a definition of the SHARE target population, we describe the protocol for harmonizing and documenting the sampling procedure and present the sampling frames used by the countries that recruited a baseline or refreshment sample in Wave 8. We then discuss some important aspects of the SHARE sampling designs, such as stratification, clustering, variation in selection probabi…

research product

Item nonresponse and imputation strategies in SHARE Wave 5

This chapter focuses on item nonresponse in the fifth wave of SHARE and the imputation strategies adopted to fill-in the missing values.

research product

Comments on “Unobservable Selection and Coefficient Stability

Abstract–: We establish a link between the approaches proposed by Oster (2019) and Pei, Pischke, and Schwandt (2019) which contribute to the development of inferential procedures for causal effects in the challenging and empirically relevant situation where the unknown data-generation process is not included in the set of models considered by the investigator. We use the general misspecification framework recently proposed by De Luca, Magnus, and Peracchi (2018) to analyze and understand the implications of the restrictions imposed by the two approaches.

research product

A Generalized Missing-Indicator Approach to Regression with Imputed Covariates

We consider estimation of a linear regression model using data where some covariate values are missing but imputations are available to fill in the missing values. This situation generates a tradeoff between bias and precision when estimating the regression parameters of interest. Using only the subsample of complete observations does not cause bias but may imply a substantial loss of precision because the complete cases may be too few. On the other hand, filling in the missing values with imputations may cause bias. We provide the new Stata command gmi, which handles such tradeoff by using either model reduction or Bayesian model averaging techniques in the context of the generalized miss…

research product

Posterior moments and quantiles for the normal location model with Laplace prior

We derive explicit expressions for arbitrary moments and quantiles of the posterior distribution of the location parameter η in the normal location model with Laplace prior, and use the results to approximate the posterior distribution of sums of independent copies of η.

research product

WEIGHTS AND IMPUTATIONS

This chapter provides a description of the weighting and imputation strategies used to address problems of unit nonresponse, sample attrition and item nonresponse in the seventh wave of SHARE.

research product

Insight into mechanisms of creatinine optical sensing using fluorescein-gold complex

Abstract Creatinine level in biological fluids is a clinically relevant parameter to monitor vital functions and it is well assessed that measuring creatinine levels in the human body can be of great utility to evaluate renal, muscular, or thyroid dysfunctions. The accurate detection of creatinine levels may have a critical role in providing information on health status and represents a tool for the early diagnosis of severe pathologies. Among different methods for creatinine detection that have been introduced and that are evolving with increasing speed, fluorescence-based and colorimetric sensors represent one of the best alternatives, thanks to their affordability, sensitivity and easy r…

research product

SNP and SML estimation of univariate and bivariate binary–choice models

We discuss the semi-nonparametric approach of Gallant and Nychka (1987, Econometrica 55: 363–390), the semiparametric maximum likelihood approach of Klein and Spady (1993, Econometrica 61: 387–421), and a set of new Stata commands for semiparametric estimation of three binary-choice models. The first is a univariate model, while the second and the third are bivariate models without and with sample selection, respectively. The proposed estimators are root-n consistent and asymptotically normal for the model parameters of interest under weak assumptions on the distribution of the underlying error terms. Our Monte Carlo simulations suggest that the efficiency losses of the semi-nonparametric a…

research product

Weak versus strong dominance of shrinkage estimators

We consider the estimation of the mean of a multivariate normal distribution with known variance. Most studies consider the risk of competing estimators, that is the trace of the mean squared error matrix. In contrast we consider the whole mean squared error matrix, in particular its eigenvalues. We prove that there are only two distinct eigenvalues and apply our findings to the James–Stein and the Thompson class of estimators. It turns out that the famous Stein paradox is no longer a paradox when we consider the whole mean squared error matrix rather than only its trace.

research product

In-Work Benefits for Married Couples: An Ex-Ante Evaluation of EITC and WTC Policies in Italy

This paper investigates labor supply and redistributive effects of in-work benefits for Italian married couples using a tax-benefit microsimulation model and a multi-sectoral discrete choice model of labor supply. We consider in-work benefits based on the Earned Income Tax Credit (EITC) and the Working Tax Credit (WTC) existing in the US and the UK, respectively. The standard design of these income support mechanisms is however augmented with a premium for two-earner households to avoid potential disincentive effects on secondary earners. Revenue neutral policy simulations show that our reforms may greatly improve the current Italian tax-benefit system in terms of both incentive and redistr…

research product

Insights on amyloid spherulites structure at molecular level

research product

The History of European Infrastructure Finance: An Analytical Framework

How can socio-economic resources be mobilized to pay for works that offer benefits only in the future, often in the distant future? We discuss what we understand by infrastructure, a term that can have different meanings/semantic contents, and whose definition issues reveal some recurrent conceptual problems. Finance is here understood in the very broad sense of a set of mechanisms bringing to investment, and future benefits, the resources needed in advance to pay for it. We offer a brief discussion of technological and organizational change, as several of our examples and other literature that we cite show that investment and finance decisions are deeply interwoven with knowledge, manageme…

research product

Effects of alirocumab on types of myocardial infarction : insights from the ODYSSEY OUTCOMES trial

Gislason, Gunnar H/0000-0002-0548-402X; Malynovsky, Yaroslav V/0000-0002-9118-1104; Bhatt, Deepak L./0000-0002-1278-6245; Nikolaev, Konstantin/0000-0003-4601-6203; Sherwood, Matthew/0000-0002-4305-5883; Chumakova, Galina A/0000-0002-2810-6531; Raffel, Owen C/0000-0001-5470-7050; Leonardi, Sergio/0000-0002-4800-6132; Tse, Hung Fat/0000-0002-9578-7808; Reshetko, Olga/0000-0003-3107-7636; Pereira, Helder/0000-0001-8656-4883; Racca, Vittorio/0000-0002-4465-3789; Podoleanu, Cristian/0000-0001-9987-2519; Ersanli, Murat/0000-0003-1847-3087; Muenzel, Thomas/0000-0001-5503-4150; Sandhu, Manjinder/0000-0003-2538-2079; Taskinen, Marja-Riitta/0000-0002-6229-3588; bastos, jose/0000-0002-9526-3123; Manak…

research product

Model averaging estimation of generalized linear models with imputed covariates

a b s t r a c t We address the problem of estimating generalized linear models when some covariate values are missing but imputations are available to fill-in the missing values. This situation generates a bias-precision trade- off in the estimation of the model parameters. Extending the generalized missing-indicator method proposed by Dardanoni et al. (2011) for linear regression, we handle this trade-off as a problem of model uncertainty using Bayesian averaging of classical maximum likelihood estimators (BAML). We also propose a block model averaging strategy that incorporates information on the missing-data patterns and is computationally simple. An empirical application illustrates our…

research product

Shrinkage efficiency bounds: An extension

Hansen (2005) obtained the efficiency bound (the lowest achievable risk) in the p-dimensional normal location model when p≥3, generalizing an earlier result of Magnus (2002) for the one-dimensional case (p=1). The classes of estimators considered are, however, different in the two cases. We provide an alternative bound to Hansen's which is a more natural generalization of the one-dimensional case, and we compare the classes and the bounds.

research product

Bayesian Model Averaging and Weighted Average Least Squares: Equivariance, Stability, and Numerical Issues

This article is concerned with the estimation of linear regression models with uncertainty about the choice of the explanatory variables. We introduce the Stata commands bma and wals which implement, respectively, the exact Bayesian Model Averaging (BMA) estimator and the Weighted Average Least Squares (WALS) estimator developed by Magnus et al. (2010). Unlike standard pretest estimators which are based on some preliminary diagnostic test, these model averaging estimators provide a coherent way of making inference on the regression parameters of interest by taking into account the uncertainty due to both the estimation and the model selection steps. Special emphasis is given to a number pra…

research product

Bayesian model averaging and weighted-average least squares: Equivariance, stability, and numerical issues

In this article, we describe the estimation of linear regression models with uncertainty about the choice of the explanatory variables. We introduce the Stata commands bma and wals, which implement, respectively, the exact Bayesian model-averaging estimator and the weighted-average least-squares estimator developed by Magnus, Powell, and Prüfer (2010, Journal of Econometrics 154: 139–153). Unlike standard pretest estimators that are based on some preliminary diagnostic test, these model-averaging estimators provide a coherent way of making inference on the regression parameters of interest by taking into account the uncertainty due to both the estimation and the model selection steps. Spec…

research product

UV-induced modifications in collagen fibers molecular structure: a fluorescence spectroscopy and microscopy study

research product

Weighted-Average Least Squares (WALS): Confidence and Prediction Intervals

We extend the results of De Luca et al. (2021) to inference for linear regression models based on weighted-average least squares (WALS), a frequentist model averaging approach with a Bayesian flavor. We concentrate on inference about a single focus parameter, interpreted as the causal effect of a policy or intervention, in the presence of a potentially large number of auxiliary parameters representing the nuisance component of the model. In our Monte Carlo simulations we compare the performance of WALS with that of several competing estimators, including the unrestricted least-squares estimator (with all auxiliary regressors) and the restricted least-squares estimator (with no auxiliary reg…

research product

In-work benefits for married couples: an ex-ante evaluation of EITC and WTC policies in Italy

This paper investigates labor supply and redistributive effects of in-work benefits for Italian married couples using a tax-benefit microsimulation model and a multi-sectoral discrete choice model of labor supply. We consider two in-work benefit schemes following the key principles of the Earned Income Tax Credit (EITC) and the Working Tax Credit (WTC) existing in the US and the UK, respectively. The standard design of these in-work benefits is however augmented with a new benefit premium for two-earner households in order to overcome the well-known disincentive effects that these welfare instruments may generate on secondary earners. In simulation, the proposed in-work benefits are finance…

research product

Estimation of ordered response models with sample selection

We introduce two new Stata commands for the estimation of an ordered response model with sample selection. The opsel command uses a standard maximum-likelihood approach to fit a parametric specification of the model where errors are assumed to follow a bivariate Gaussian distribution. The snpopsel command uses the semi-nonparametric approach of Gallant and Nychka (1987, Econometrica 55: 363–390) to fit a semiparametric specification of the model where the bivariate density function of the errors is approximated by a Hermite polynomial expansion. The snpopsel command extends the set of Stata routines for semi-nonparametric estimation of discrete response models. Compared to the other semi-n…

research product

Sampling Design and Weighting Strategies in the Second Wave of SHARE

research product

BALANCED VARIABLE ADDITION IN LINEAR MODELS

This paper studies what happens when we move from a short regression to a long regression in a setting where both regressions are subject to misspecification. In this setup, the least-squares estimator in the long regression may have larger inconsistency than the least-squares estimator in the short regression. We provide a simple interpretation for the comparison of the inconsistencies and study under which conditions the additional regressors in the long regression represent a “balanced addition” to the short regression.

research product

Weighted-average least squares (WALS): A survey

Model averaging has become a popular method of estimation, following increasing evidence that model selection and estimation should be treated as one joint procedure. Weighted-average least squares (WALS) is a recent model-average approach, which takes an intermediate position between frequentist and Bayesian methods, allows a credible treatment of ignorance, and is extremely fast to compute. We review the theory of WALS and discuss extensions and applications.

research product

WEIGHTED-AVERAGE LEAST SQUARES (WALS): A SURVEY

Model averaging has become a popular method of estimation, following increasing evidence that model selection and estimation should be treated as one joint procedure. Weighted- average least squares (WALS) is a recent model-average approach, which takes an intermediate position between frequentist and Bayesian methods, allows a credible treatment of ignorance, and is extremely fast to compute. We review the theory of WALS and discuss extensions and applications.

research product

A Sample Selection Model for Unit and Item Nonresponse in Cross-Sectional Surveys

We consider a general sample selection model where unit and item nonresponse simultaneously affect a regression relationship of interest, and both types of nonresponse are potentially correlated. We estimate both parametric and semiparametric specifications of the model. The parametric specification assumes that the errors in the latent regression equations follow a trivariate Gaussian distribution. The semiparametric specification avoids distributional assumptions about the underlying regression errors. In our empirical application, we estimate Engel curves for consumption expenditure using data from the first wave of SHARE (Survey on Health, Aging and Retirement in Europe).

research product

SAMPLING DESIGN IN SHARE WAVE 7

This chapter documents the sampling design adopted in SHARE. Starting with a definition of the SHARE target population, we describe the protocol that is followed to harmonise and document the sampling procedure and present the sampling frames used by the countries that recruited a baseline or refreshment sample in Wave 7. We then discuss some important aspects of the SHARE sampling design, such as stratification, clustering, variation in selection probabilities and sample composition. Finally, we provide additional information about the sampling variables included in the released SHARE dataset.

research product

Asymptotic properties of the weighted-average least squares (WALS) estimator

research product

On the ambiguous consequences of omitting variables

This paper studies what happens when we move from a short regression to a long regression (or vice versa), when the long regression is shorter than the data-generation process. In the special case where the long regression equals the data-generation process, the least-squares estimators have smaller bias (in fact zero bias) but larger variances in the long regression than in the short regression. But if the long regression is also misspecified, the bias may not be smaller. We provide bias and mean squared error comparisons and study the dependence of the differences on the misspecification parameter.

research product

Weighted-average least squares estimation of generalized linear models

The weighted-average least squares (WALS) approach, introduced by Magnus et al. (2010) in the context of Gaussian linear models, has been shown to enjoy important advantages over other strictly Bayesian and strictly frequentist model averaging estimators when accounting for problems of uncertainty in the choice of the regressors. In this paper we extend the WALS approach to deal with uncertainty about the specification of the linear predictor in the wider class of generalized linear models (GLMs). We study the large-sample properties of the WALS estimator for GLMs under a local misspecification framework that allows the development of asymptotic model averaging theory. We also investigate t…

research product

Probing ensemble polymorphism and single aggregate structural heterogeneity in insulin amyloid self-assembly.

Ensembles of protein aggregates are characterized by a nano- and micro-scale heterogeneity of the species. This diversity translates into a variety of effects that protein aggregates may have in biological systems, both in connection to neurodegenerative diseases and immunogenic risk of protein drug products. Moreover, this naturally occurring variety offers unique opportunities in the field of protein-based biomaterials. In the above-mentioned fields, the isolation and structural analysis of the different amyloid types within the same ensemble remain a priority, still representing a significant experimental challenge. Here we address such complexity in the case of insulin for its relevance…

research product

Comments on “Unobservable Selection and Coefficient Stability: Theory and Evidence” and “Poorly Measured Confounders are More Useful on the Left Than on the Right”

We establish a link between the approaches proposed by Oster (2019) and Pei, Pischke, and Schwandt (2019) which contribute to the development of inferential procedures for causal effects in the challenging and empirically relevant situation where the unknown data-generation process is not included in the set of models considered by the investigator. We use the general misspecification framework recently proposed by De Luca, Magnus, and Peracchi (2018) to analyze and understand the implications of the restrictions imposed by the two approaches.

research product