0000000000114406
AUTHOR
Nanny Wermuth
Kovarianzselektion als Explorative Methode
Die Theorie der Kovarianzselektion - insbesondere die der Untergruppe des multiplikativen Modelle -wird kurz beschrieben. Es wird gezeigt, inwiefern jedes multiplikative Kovarianzselektionsmodell einem System von Regressionsgleichungen und einem Modell der Pfadanalyse entspricht. Anhand eines vorgegebenen Datensatzes wird schlieslich verdeutlicht, wie man Kovarianzselektion zur Datenexploration verwenden kann.
Algorithm AS 105: Fitting a Covariance Selection Model to a Matrix
A Comment on the Coefficient of Determination for Binary Responses
Abstract Linear logistic or probit regression can be closely approximated by an unweighted least squares analysis of the regression linear in the conditional probabilities provided that these probabilities for success and failure are not too extreme. It is shown how this restriction on the probabilities translates into a restriction on the range of the coefficient of determination R 2 so that, as a consequence, R 2 is not suitable to judge the effectiveness of linear regressions with binary responses even if an important relation is present.
On Association Models Defined over Independence Graphs
Conditions on joint distributions are given under which two variables will be conditionally associated whenever an independence graph does not imply a corresponding conditional independence statement. To this end the notions of parametric cancellation, of stable paths and of quasi-linear models are discussed in some detail.
Some determinants of the migration of professional manpower.
Abstract Determinants of migration of professional manpower are investigated using data from a 1970 survey of immigrants to the United States. From a respondent’s stated “intent to stay” in the United States and five other characteristics a six-dimensional contingency table is formed. We find a well-fitting log-linear model for this table. Thus, we establish the importance of selected determinants of migration and present a table of predicted rates of intent to stay in the United States
An approximation to maximum likelihood estimates in reduced models
SUMMARY An approximation to the maximum likelihood estimates of the parameters in a model can be obtained from the corresponding estimates and information matrices in an extended model, i.e. a model with additional parameters. The approximation is close provided that the data are consistent with the first model. Applications are described to log linear models for discrete data, to models for multivariate normal distributions with special covariance matrices and to mixed discrete-continuous models.
Anwendungen in der Medizin
An Beispielen aus dem Bereich der Medizin und der Psychologie soll im folgenden gezeigt werden, wie die Theorie der logarithmisch-linearen Modelle und die Theorie der Kovarianzauswahl zur Zusammenhangsanalyse verwendet werden konnen. Eine wichtige Einzelfrage bei komplexen wechselseitigen Beziehungen ist die, ob Storoder Hintergrund-Faktoren einen hauptsachlich untersuchten Zusammenhang wesentlich beeinflussen oder verandern. Wegen der Wichtigkeit dieser Frage wird sie in einem eigenen Kapitel (Kap. 3.1) behandelt.
Derived variables calculated from similar joint responses: some characteristics and examples
Abstract A technique (Cox and Wermuth, 1992) is reviewed for finding linear combinations of a set of response variables having special relations of linear conditional independence with a set of explanatory variables. A theorem in linear algebra is used both to examine conditions in which the derived variables take a specially simple form and lead to reduced computations. Examples are discussed of medical and psychological investigations in which the method has aided interpretation.
Tests of Linearity, Multivariate Normality and the Adequacy of Linear Scores
After some discussion of the purposes of testing multivariate normality, the paper concentrates on two different approaches to testing linearity: on repeated regression tests of non-linearity and on exploiting properties of a dichotomized normal distribution. Regression tests of linearity are used to examine the adequacy of linear scoring systems for explanatory variables, initially recorded on an ordinal scale. Examples from recent psychological and medical research are given in which the methods have led to some insight into subject-matter.
On the calculation of derived variables in the analysis of multivariate responses
AbstractThe multivariate regression of a p × 1 vector Y of random variables on a q × 1 vector X of explanatory variables is considered. It is assumed that linear transformations of the components of Y can be the basis for useful interpretation whereas the components of X have strong individual identity. When p ≥ q a transformation is found to a new q × 1 vector of responses Y∗ such that in the multiple regression of, say, Y1∗ on X, only the coefficient of X1 is nonzero, i.e. such that Y1∗ is conditionally independent of X2, …, Xq, given X1. Some associated inferential procedures are sketched. An illustrative example is described in which the resulting transformation has aided interpretation.
Response models for mixed binary and quantitative variables
SUMMARY A number of special representations are considered for the joint distribution of qualitative, mostly binary, and quantitative variables. In addition to the conditional Gaussian models and to conditional Gaussian regression chain models some emphasis is placed on models derived from an underlying multivariate normal distribution and on models in which discrete probabilities are specified linearly in terms of unknown parameters. The possibilities for choosing between the models empirically are examined, as well as the testing of independence and conditional independence and the estimation of parameters. Often the testing of independence is exactly or nearly the same for a number of di…
Graphical Models for Dependencies and Associations
The role of graphical representations is described in distinguishing various special forms of independency structure that can arise with multivariate data, especially in observational studies in the social sciences. Conventions for constructing the graphs and strategies for analysing three sets of data are summarized. Finally some directions for desirable future work are outlined.
Moderating effects of subgroups in linear models
SUMMARY Possibilities for moderating effects of a subgrouping variable on strength or direction of an association have been much discussed by social scientists but have not been given satisfactory statistical formulations. The results concern directed measures of associations in linear models containing just three variables. Some key words: Analysis of covariance; Analysis of variance; cG-distribution; Conditional independence; Graphical chain model; Parallel regressions; Yule-Simpson paradox. 1. INTRODUCTION Linear models are commonly used as a framework to estimate and test how a continuous response variable depends on potential influencing variables. This paper is concerned with the situ…
Causal diagrams for empirical research
Finding condensed descriptions for multi-dimensional data.
Abstract We describe two programs that may be used to find condensed descriptions for data available in a contingency table or in a covariance matrix in the case that these data follow a multinomial or a multivariate normal distribution, respectively. The programs perform a stepwise model search among multiplicative models by computing appropriate likelihood-ratio test statistics.
Statistische Theorie und Rechenverfahren
Sowohl die Kovarianzselektion als auch das Anpassen logarithmisch-linearer Modelle an eine Kontingenztafel wurden zunachst als Verfahren zur Parameterreduktion angesehen, das heist als Verfahren, die bei einem Misverhaltnis zwischen der Zahl der Beobachtungen und der Zahl der zu schatzenden Parameter Abhilfe zu schaffen suchen. So wurde zum Beispiel ein Rechenverfahren zum Anpassen logarithmisch-linearer Modelle vorgeschlagen und programmiert (Y.M.M. Bishop (1967)), das in einer Studie uber die Todesfolgen mehrerer Narkosemittel benotigt wurde (National Halothane Study). In dieser Studie sollten fur acht verschiedene Narkosemittel die Wahrscheinlichkeiten dafur, innerhalb von 6 Wochen nach …
When can association graphs admit a causal interpretation?
We discuss essentially linear structures which are adequately represented by association graphs called covariance graphs and concentration graphs. These do not explicitly indicate a process by which data could be generated in a stepwise fashion. Therefore, on their own, they do not suggest a causal interpretation. By contrast, each directed acyclic graph describes such a process and may offer a causal interpretation whenever this process is in agreement with substantive knowledge about causation among the variables under study. We derive conditions and procedures to decide for any given covariance graph or concentration graph whether all their pairwise independencies can be implied by some …
Explicit, identical maximum likelihood estimates for some cyclic Gaussian and cyclic Ising models
Cyclic models are a subclass of graphical Markov models with simple, undirected probability graphs that are chordless cycles. In general, all currently known distributions require iterative procedures to obtain maximum likelihood estimates in such cyclic models. For exponential families, the relevant conditional independence constraint for a variable pair is given all remaining variables, and it is captured by vanishing canonical parameters involving this pair. For Gaussian models, the canonical parameter is a concentration, that is, an off-diagonal element in the inverse covariance matrix, while for Ising models, it is a conditional log-linear, two-factor interaction. We give conditions un…
Binary distributions of concentric rings
We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly spaced concentric rings. Kronecker product characterizations make them computationally attractive for a large number of variables. We study the behavior of different measures of dependence and derive maximum likelihood estimates when all nodes are observed and when the inner node is hidden.
Causal Inference and Statistical Fallacies
Fallacies are defined as plausible-seeming arguments that give the wrong conclusion. The article concentrates on those with some connection with causality. The classical definition of causality involving a necessary and sufficient condition for an effect is rejected and three possible definitions discussed. The first is that of a statistical association that cannot be explained away as the effect of admissible alternative features. To make this more precise, Markov graphical representations are introduced and the important distinction between pairs of variables on an equal footing and those in a potential explanatory-response relation described. The roles of unobserved confounders and of ra…
Linear Recursive Equations, Covariance Selection, and Path Analysis
Abstract By defining a reducible zero pattern and by using the concept of multiplicative models, we relate linear recursive equations that have been introduced by econometrician Herman Wold (1954) and path analysis as it was proposed by geneticist Sewall Wright (1923) to the statistical theory of covariance selection formulated by Arthur Dempster (1972). We show that a reducible zero pattern is the condition under which parameters as well as least squares estimates in recursive equations are one-to-one transformations of parameters and of maximum likelihood estimates, respectively, in a decomposable covariance selection model. As a consequence, (a) we can give a closed-form expression for t…
Statistical Dependence and Independence
Statistical dependence is a type of relation between different characteristics measured on the same units. At one extreme is deterministic dependence; at the other is statistical independence, where the distribution of one variable is the same for all levels of the other. With more than two variables, an important distinction is between marginal and conditional dependence. In many contexts, the degree of dependence may be summarized by a suitable measure of association, perhaps as part of a general model. Reference is made to graphical models. Keywords: association; correlation; marginal; conditional; exponential family; graphical Markov models
Pairwise Markov properties for regression graphs
With a sequence of regressions, one may generate joint probability distributions. One starts with a joint, marginal distribution of context variables having possibly a concentration graph structure and continues with an ordered sequence of conditional distributions, named regressions in joint responses. The involved random variables may be discrete, continuous or of both types. Such a generating process specifies for each response a conditioning set that contains just its regressor variables, and it leads to at least one valid ordering of all nodes in the corresponding regression graph that has three types of edge: one for undirected dependences among context variables, another for undirect…