6533b871fe1ef96bd12d0e1c

RESEARCH PRODUCT

Selection of the Best Subset of Variables in Regression and Time Series Models

Maris PurgailisNicholas A. NechvalKonstantin N. NechvalUldis Rozevskis

subject

Series (mathematics)StatisticsDesign matrixErrors-in-variables modelsRegression analysisCross-sectional regressionSelection (genetic algorithm)RegressionMathematics

description

The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. Several papers have dealt with various aspects of the problem but it appears that the typical regression user has not benefited appreciably. One reason for the lack of resolution of the problem is the fact that it is has not been well defined. Indeed, it is apparent that there is not a single problem, but rather several problems for which different answers might be appropriate. The intent of this chapter is not to give specific answers but merely to present a new simple multiplicative variable selection criterion based on the parametrically penalized residual sum of squares to address the subset selection problem in multiple linear regression analysis, where the objective is to select a minimal subset of predictor variables without sacrificing any explanatory power. The variables, which optimize this criterion, are chosen to be the best variables. The authors find that the proposed criterion performs consistently well across a wide variety of variable selection problems. Practical utility of this criterion is demonstrated by numerical examples.

https://doi.org/10.4018/978-1-61520-668-1.ch016