6533b884fe1ef96bd12def1e

RESEARCH PRODUCT

Improving identification algorithms in causal inference

Santtu Tikka

subject

päättelyR-kielialgoritmitkausaliteettimuuttujatmallittodennäköisyysgraafit

description

Causal models provide a formal approach to the study of causality. One of the most useful features of causal modeling is that it enables one to make causal claims about a phenomenon using observational data alone under suitable conditions. This feature enables the analysis of interventions that may be infeasible to conduct in the real world for practical or ethical reasons. The uncertainty associated with the variables of interest is taken into account by including a probability distribution in the causal model, making it is possible to study the effects of external interventions by examining how this distribution is changed by the action. The probability distribution of a specific variable in a causal model perturbed by an outside intervention is the causal effect of that intervention on the variable. One of the most fundamental problems of causal inference is determining whether a causal effect can be uniquely expressed in terms of the joint probability distribution over the observed variables in a given causal model. Causal effects that can be expressed in this way are called identifiable and they serve as the link between observational and experimental information. Complete solutions to the identifiability problem take the form of an algorithm that produces an expression in terms of observed quantities whenever the causal effect given as input is identifiable. However, completeness in this context refers only to the correctness and exhaustiveness of the methods. The formulas obtained as output from identifiability algorithms are often impractical and unnecessarily complicated. The thesis augments the pre-existing identifiability methodology by providing a simplification procedure that drastically improves the complicated outputs in many cases. Simplification also has practical benefits when statistical estimation is considered if variables affected by bias or missing data no longer appear in the simplified expression. The thesis also introduces a new method called pruning, which aims to eliminate variables that are unnecessary for the identification task from the causal model itself. Finally, a variety of identification algorithms are implemented more complicated settings, such as when data are available from multiple domains. The methods are provided through the R package “causaleffect”

http://urn.fi/URN:ISBN:978-951-39-7519-7