0000000001169956

AUTHOR

Santtu Tikka

The gallium anomaly reassessed using a Bayesian approach

The solar-neutrino detectors GALLEX and SAGE were calibrated by electron-neutrino flux from the $^{37}$Ar and $^{51}$Cr calibration sources. A deficit in the measured neutrino flux was recorded by counting the number of neutrino-induced conversions of the $^{71}$Ga nuclei to $^{71}$Ge nuclei. This deficit was coined ``gallium anomaly'' and it has lead to speculations about beyond-the-standard-model physics in the form of eV-mass sterile neutrinos. Notably, this anomaly has already defied final solution for more than 20 years. Here we reassess the statistical significance of this anomaly and improve the related statistical approaches by treating the neutrino experiments as repeated Bernoulli…

research product

Body weight and premature retirement: population-based evidence from Finland.

Abstract Background Health status is a principal determinant of labour market participation. In this study, we examined whether excess weight is associated with withdrawal from the labour market owing to premature retirement. Methods The analyses were based on nationally representative data from Finland over the period 2001–15 (N ∼ 2500). The longitudinal data included objective measures of body weight (i.e. body mass index and waist circumference) linked to register-based information on actual retirement age. The association between the body weight measures and premature retirement was modelled using cubic b-splines via logistic regression. The models accounted for other possible risk fact…

research product

Surrogate outcomes and transportability

Identification of causal effects is one of the most fundamental tasks of causal inference. We consider an identifiability problem where some experimental and observational data are available but neither data alone is sufficient for the identification of the causal effect of interest. Instead of the outcome of interest, surrogate outcomes are measured in the experiments. This problem is a generalization of identifiability using surrogate experiments and we label it as surrogate outcome identifiability. We show that the concept of transportability provides a sufficient criteria for determining surrogate outcome identifiability for a large class of queries.

research product

Sima – an Open-source Simulation Framework for Realistic Large-scale Individual-level Data Generation

We propose a framework for realistic data generation and the simulation of complex systems and demonstrate its capabilities in a health domain example. The main use cases of the framework are predicting the development of variables of interest, evaluating the impact of interventions and policy decisions, and supporting statistical method development. We present the fundamentals of the framework by using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing, and fast random number gener…

research product

Simplifying Probabilistic Expressions in Causal Inference

Obtaining a non-parametric expression for an interventional distribution is one of the most fundamental tasks in causal inference. Such an expression can be obtained for an identifiable causal effect by an algorithm or by manual application of do-calculus. Often we are left with a complicated expression which can lead to biased or inefficient estimates when missing data or measurement errors are involved. We present an automatic simplification algorithm that seeks to eliminate symbolically unnecessary variables from these expressions by taking advantage of the structure of the underlying graphical model. Our method is applicable to all causal effect formulas and is readily available in the …

research product

Sublethal Pyrethroid Insecticide Exposure Carries Positive Fitness Effects Over Generations in a Pest Insect

AbstractStress tolerance and adaptation to stress are known to facilitate species invasions. Many invasive species are also pests and insecticides are used to control them, which could shape their overall tolerance to stress. It is well-known that heavy insecticide usage leads to selection of resistant genotypes but less is known about potential effects of mild sublethal insecticide usage. We studied whether stressful, sublethal pyrethroid insecticide exposure has within-generational and/or maternal transgenerational effects on fitness-related traits in the Colorado potato beetle (Leptinotarsa decemlineata) and whether maternal insecticide exposure affects insecticide tolerance of offspring…

research product

The effects of short-term glyphosate-based herbicide exposure on insect gene expression profiles

Glyphosate-based herbicides (GBHs) are the most frequently used herbicides worldwide. The use of GBHs is intended to tackle weeds, but GBHs have been shown to affect the life-history traits and antioxidant defense system of invertebrates found in agroecosystems. Thus far, the effects of GBHs on detoxification pathways among invertebrates have not been sufficiently investigated. We performed two different experiments—1) the direct pure glyphosate and GBH treatment, and 2) the indirect GBH experiment via food—to examine the possible effects of environmentally relevant GBH levels on the survival of the Colorado potato beetle (Leptinotarsa decemlineata) and the expression profiles of their deto…

research product

Body weight and premature retirement : population-based evidence from Finland

Background Health status is a principal determinant of labour market participation. In this study, we examined whether excess weight is associated with withdrawal from the labour market owing to premature retirement. Methods The analyses were based on nationally representative data from Finland over the period 2001–15 (N ∼ 2500). The longitudinal data included objective measures of body weight (i.e. body mass index and waist circumference) linked to register-based information on actual retirement age. The association between the body weight measures and premature retirement was modelled using cubic b-splines via logistic regression. The models accounted for other possible risk factors and p…

research product

Do-search -- a tool for causal inference and study design with multiple data sources

Epidemiologic evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries, and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with nontrivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an i…

research product

Enhancing identification of causal effects by pruning

Causal models communicate our assumptions about causes and effects in real-world phe- nomena. Often the interest lies in the identification of the effect of an action which means deriving an expression from the observed probability distribution for the interventional distribution resulting from the action. In many cases an identifiability algorithm may return a complicated expression that contains variables that are in fact unnecessary. In practice this can lead to additional computational burden and increased bias or inefficiency of estimates when dealing with measurement error or missing data. We present graphical criteria to detect variables which are redundant in identifying causal effe…

research product

Estimation of causal effects with small data in the presence of trapdoor variables

We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples. This bias is related to variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with rea…

research product

Identifying Causal Effects via Context-specific Independence Relations

Causal effect identification considers whether an interventional probability distribution can be uniquely determined from a passively observed distribution in a given causal structure. If the generating system induces context-specific independence (CSI) relations, the existing identification procedures and criteria based on do-calculus are inherently incomplete. We show that deciding causal effect non-identifiability is NP-hard in the presence of CSIs. Motivated by this, we design a calculus and an automated search procedure for identifying causal effects in the presence of CSIs. The approach is provably sound and it includes standard do-calculus as a special case. With the approach we can …

research product

Identifying Causal Effects with the R Package causaleffect

Do-calculus is concerned with estimating the interventional distribution of an action from the observed joint probability distribution of the variables in a given causal structure. All identifiable causal effects can be derived using the rules of do-calculus, but the rules themselves do not give any direct indication whether the effect in question is identifiable or not. Shpitser and Pearl constructed an algorithm for identifying joint interventional distributions in causal models, which contain unobserved variables and induce directed acyclic graphs. This algorithm can be seen as a repeated application of the rules of do-calculus and known properties of probabilities, and it ultimately eit…

research product

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and…

research product

Simulation Framework for Realistic Large-scale Individual-level Data Generation with an Application in the Health Domain

We propose a framework for realistic data generation and simulation of complex systems and demonstrate its capabilities in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which…

research product

dynamite: An R Package for Dynamic Multivariate Panel Models

dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of …

research product

Improving identification algorithms in causal inference

Causal models provide a formal approach to the study of causality. One of the most useful features of causal modeling is that it enables one to make causal claims about a phenomenon using observational data alone under suitable conditions. This feature enables the analysis of interventions that may be infeasible to conduct in the real world for practical or ethical reasons. The uncertainty associated with the variables of interest is taken into account by including a probability distribution in the causal model, making it is possible to study the effects of external interventions by examining how this distribution is changed by the action. The probability distribution of a specific variable i…

research product

Itseopiskelumateriaalia: Kausaalimallintamisen perusteet tilastotieteessä

Tämä moniste on tarkoitettu itseopiskelumateriaaliksi tilastotieteen maisterivaiheen opiskelijoille (tai vastaavat tiedot omaaville). Erityisesti todennäköisyyslaskennan ja yleistettyjen lineaaristen mallien tuntemus on tarpeen. Materiaalin tarkoituksena on selvittää lukijalle perusteet Judea Pearlin kehittämästä kausaalimallintamisesta ja -laskennasta. Materiaali perustuu Judea Pearlin kirjaan Causality [Pearl, 2009]. Lauseiden ja määritelmien kohdalla annetaan aina kirjan osio, josta nämä löytyvät. nonPeerReviewed

research product