0000000000231079
AUTHOR
David Cox
A Comment on the Coefficient of Determination for Binary Responses
Abstract Linear logistic or probit regression can be closely approximated by an unweighted least squares analysis of the regression linear in the conditional probabilities provided that these probabilities for success and failure are not too extreme. It is shown how this restriction on the probabilities translates into a restriction on the range of the coefficient of determination R 2 so that, as a consequence, R 2 is not suitable to judge the effectiveness of linear regressions with binary responses even if an important relation is present.
On Association Models Defined over Independence Graphs
Conditions on joint distributions are given under which two variables will be conditionally associated whenever an independence graph does not imply a corresponding conditional independence statement. To this end the notions of parametric cancellation, of stable paths and of quasi-linear models are discussed in some detail.
An approximation to maximum likelihood estimates in reduced models
SUMMARY An approximation to the maximum likelihood estimates of the parameters in a model can be obtained from the corresponding estimates and information matrices in an extended model, i.e. a model with additional parameters. The approximation is close provided that the data are consistent with the first model. Applications are described to log linear models for discrete data, to models for multivariate normal distributions with special covariance matrices and to mixed discrete-continuous models.
Derived variables calculated from similar joint responses: some characteristics and examples
Abstract A technique (Cox and Wermuth, 1992) is reviewed for finding linear combinations of a set of response variables having special relations of linear conditional independence with a set of explanatory variables. A theorem in linear algebra is used both to examine conditions in which the derived variables take a specially simple form and lead to reduced computations. Examples are discussed of medical and psychological investigations in which the method has aided interpretation.
Tests of Linearity, Multivariate Normality and the Adequacy of Linear Scores
After some discussion of the purposes of testing multivariate normality, the paper concentrates on two different approaches to testing linearity: on repeated regression tests of non-linearity and on exploiting properties of a dichotomized normal distribution. Regression tests of linearity are used to examine the adequacy of linear scoring systems for explanatory variables, initially recorded on an ordinal scale. Examples from recent psychological and medical research are given in which the methods have led to some insight into subject-matter.
On the calculation of derived variables in the analysis of multivariate responses
AbstractThe multivariate regression of a p × 1 vector Y of random variables on a q × 1 vector X of explanatory variables is considered. It is assumed that linear transformations of the components of Y can be the basis for useful interpretation whereas the components of X have strong individual identity. When p ≥ q a transformation is found to a new q × 1 vector of responses Y∗ such that in the multiple regression of, say, Y1∗ on X, only the coefficient of X1 is nonzero, i.e. such that Y1∗ is conditionally independent of X2, …, Xq, given X1. Some associated inferential procedures are sketched. An illustrative example is described in which the resulting transformation has aided interpretation.
Response models for mixed binary and quantitative variables
SUMMARY A number of special representations are considered for the joint distribution of qualitative, mostly binary, and quantitative variables. In addition to the conditional Gaussian models and to conditional Gaussian regression chain models some emphasis is placed on models derived from an underlying multivariate normal distribution and on models in which discrete probabilities are specified linearly in terms of unknown parameters. The possibilities for choosing between the models empirically are examined, as well as the testing of independence and conditional independence and the estimation of parameters. Often the testing of independence is exactly or nearly the same for a number of di…
Graphical Models for Dependencies and Associations
The role of graphical representations is described in distinguishing various special forms of independency structure that can arise with multivariate data, especially in observational studies in the social sciences. Conventions for constructing the graphs and strategies for analysing three sets of data are summarized. Finally some directions for desirable future work are outlined.
Causal diagrams for empirical research
Causal Inference and Statistical Fallacies
Fallacies are defined as plausible-seeming arguments that give the wrong conclusion. The article concentrates on those with some connection with causality. The classical definition of causality involving a necessary and sufficient condition for an effect is rejected and three possible definitions discussed. The first is that of a statistical association that cannot be explained away as the effect of admissible alternative features. To make this more precise, Markov graphical representations are introduced and the important distinction between pairs of variables on an equal footing and those in a potential explanatory-response relation described. The roles of unobserved confounders and of ra…
Statistical Dependence and Independence
Statistical dependence is a type of relation between different characteristics measured on the same units. At one extreme is deterministic dependence; at the other is statistical independence, where the distribution of one variable is the same for all levels of the other. With more than two variables, an important distinction is between marginal and conditional dependence. In many contexts, the degree of dependence may be summarized by a suitable measure of association, perhaps as part of a general model. Reference is made to graphical models. Keywords: association; correlation; marginal; conditional; exponential family; graphical Markov models