0000000001327694
AUTHOR
Santtu Tikka
The gallium anomaly reassessed using a Bayesian approach
The solar-neutrino detectors GALLEX and SAGE were calibrated by electron-neutrino flux from the $^{37}$Ar and $^{51}$Cr calibration sources. A deficit in the measured neutrino flux was recorded by counting the number of neutrino-induced conversions of the $^{71}$Ga nuclei to $^{71}$Ge nuclei. This deficit was coined ``gallium anomaly'' and it has lead to speculations about beyond-the-standard-model physics in the form of eV-mass sterile neutrinos. Notably, this anomaly has already defied final solution for more than 20 years. Here we reassess the statistical significance of this anomaly and improve the related statistical approaches by treating the neutrino experiments as repeated Bernoulli…
Body weight and premature retirement: population-based evidence from Finland.
Abstract Background Health status is a principal determinant of labour market participation. In this study, we examined whether excess weight is associated with withdrawal from the labour market owing to premature retirement. Methods The analyses were based on nationally representative data from Finland over the period 2001–15 (N ∼ 2500). The longitudinal data included objective measures of body weight (i.e. body mass index and waist circumference) linked to register-based information on actual retirement age. The association between the body weight measures and premature retirement was modelled using cubic b-splines via logistic regression. The models accounted for other possible risk fact…
Surrogate outcomes and transportability
Identification of causal effects is one of the most fundamental tasks of causal inference. We consider an identifiability problem where some experimental and observational data are available but neither data alone is sufficient for the identification of the causal effect of interest. Instead of the outcome of interest, surrogate outcomes are measured in the experiments. This problem is a generalization of identifiability using surrogate experiments and we label it as surrogate outcome identifiability. We show that the concept of transportability provides a sufficient criteria for determining surrogate outcome identifiability for a large class of queries.
Sima – an Open-source Simulation Framework for Realistic Large-scale Individual-level Data Generation
We propose a framework for realistic data generation and the simulation of complex systems and demonstrate its capabilities in a health domain example. The main use cases of the framework are predicting the development of variables of interest, evaluating the impact of interventions and policy decisions, and supporting statistical method development. We present the fundamentals of the framework by using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing, and fast random number gener…
Simplifying Probabilistic Expressions in Causal Inference
Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the …
Sublethal Pyrethroid Insecticide Exposure Carries Positive Fitness Effects Over Generations in a Pest Insect
AbstractStress tolerance and adaptation to stress are known to facilitate species invasions. Many invasive species are also pests and insecticides are used to control them, which could shape their overall tolerance to stress. It is well-known that heavy insecticide usage leads to selection of resistant genotypes but less is known about potential effects of mild sublethal insecticide usage. We studied whether stressful, sublethal pyrethroid insecticide exposure has within-generational and/or maternal transgenerational effects on fitness-related traits in the Colorado potato beetle (Leptinotarsa decemlineata) and whether maternal insecticide exposure affects insecticide tolerance of offspring…
The effects of short-term glyphosate-based herbicide exposure on insect gene expression profiles
Glyphosate-based herbicides (GBHs) are the most frequently used herbicides worldwide. The use of GBHs is intended to tackle weeds, but GBHs have been shown to affect the life-history traits and antioxidant defense system of invertebrates found in agroecosystems. Thus far, the effects of GBHs on detoxification pathways among invertebrates have not been sufficiently investigated. We performed two different experiments—1) the direct pure glyphosate and GBH treatment, and 2) the indirect GBH experiment via food—to examine the possible effects of environmentally relevant GBH levels on the survival of the Colorado potato beetle (Leptinotarsa decemlineata) and the expression profiles of their deto…
Body weight and premature retirement : population-based evidence from Finland
Background Health status is a principal determinant of labour market participation. In this study, we examined whether excess weight is associated with withdrawal from the labour market owing to premature retirement. Methods The analyses were based on nationally representative data from Finland over the period 2001–15 (N ∼ 2500). The longitudinal data included objective measures of body weight (i.e. body mass index and waist circumference) linked to register-based information on actual retirement age. The association between the body weight measures and premature retirement was modelled using cubic b-splines via logistic regression. The models accounted for other possible risk factors and p…
Do-search -- a tool for causal inference and study design with multiple data sources
Epidemiologic evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries, and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with nontrivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an i…
Enhancing identification of causal effects by pruning
Causal models communicate our assumptions about causes and effects in real-world phe- nomena. Often the interest lies in the identification of the effect of an action which means deriving an expression from the observed probability distribution for the interventional distribution resulting from the action. In many cases an identifiability algorithm may return a complicated expression that contains variables that are in fact unnecessary. In practice this can lead to additional computational burden and increased bias or inefficiency of estimates when dealing with measurement error or missing data. We present graphical criteria to detect variables which are redundant in identifying causal effe…
Estimation of causal effects with small data in the presence of trapdoor variables
We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples. This bias is related to variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with rea…
Identifying Causal Effects via Context-specific Independence Relations
Causal effect identification considers whether an interventional probability distribution can be uniquely determined from a passively observed distribution in a given causal structure. If the generating system induces context-specific independence (CSI) relations, the existing identification procedures and criteria based on do-calculus are inherently incomplete. We show that deciding causal effect non-identifiability is NP-hard in the presence of CSIs. Motivated by this, we design a calculus and an automated search procedure for identifying causal effects in the presence of CSIs. The approach is provably sound and it includes standard do-calculus as a special case. With the approach we can …
Identifying Causal Effects with the R Package causaleffect
Do-calculus is concerned with estimating the interventional distribution of an action from the observed joint probability distribution of the variables in a given causal structure. All identifiable causal effects can be derived using the rules of do-calculus, but the rules themselves do not give any direct indication whether the effect in question is identifiable or not. Shpitser and Pearl constructed an algorithm for identifying joint interventional distributions in causal models, which contain unobserved variables and induce directed acyclic graphs. This algorithm can be seen as a repeated application of the rules of do-calculus and known properties of probabilities, and it ultimately eit…
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach
Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…
Simulation Framework for Realistic Large-scale Individual-level Data Generation with an Application in the Health Domain
We propose a framework for realistic data generation and simulation of complex systems and demonstrate its capabilities in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which…
dynamite: An R Package for Dynamic Multivariate Panel Models
dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of …
Improving identification algorithms in causal inference
Causal models provide a formal approach to the study of causality. One of the most useful features of causal modeling is that it enables one to make causal claims about a phenomenon using observational data alone under suitable conditions. This feature enables the analysis of interventions that may be infeasible to conduct in the real world for practical or ethical reasons. The uncertainty associated with the variables of interest is taken into account by including a probability distribution in the causal model, making it is possible to study the effects of external interventions by examining how this distribution is changed by the action. The probability distribution of a specific variable i…
Itseopiskelumateriaalia: Kausaalimallintamisen perusteet tilastotieteessä
Tämä moniste on tarkoitettu itseopiskelumateriaaliksi tilastotieteen maisterivaiheen opiskelijoille (tai vastaavat tiedot omaaville). Erityisesti todennäköisyyslaskennan ja yleistettyjen lineaaristen mallien tuntemus on tarpeen. Materiaalin tarkoituksena on selvittää lukijalle perusteet Judea Pearlin kehittämästä kausaalimallintamisesta ja -laskennasta. Materiaali perustuu Judea Pearlin kirjaan Causality [Pearl, 2009]. Lauseiden ja määritelmien kohdalla annetaan aina kirjan osio, josta nämä löytyvät. nonPeerReviewed