0000000000005176

AUTHOR

Jouni Helske

showing 24 related works from this author

Comparison of Attention Behaviour Across User Sets through Automatic Identification of Common Areas of Interest

2020

Eye tracking is used to analyze and compare user behaviour within numerous domains, but long duration eye tracking experiments across multiple users generate millions of eye gaze samples, making th ...

Identification (information)InformationSystems_MODELSANDPRINCIPLESbusiness.industryComputer scienceComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONEye trackingComputer visionArtificial intelligencebusinessHidden Markov modelProceedings of the Annual Hawaii International Conference on System Sciences
researchProduct

Can visualization alleviate dichotomous thinking? Effects of visual representations on the cliff effect

2021

Common reporting styles for statistical results in scientific articles, such as $p$ p -values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the $p$ p -value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recom…

FOS: Computer and information sciencesvisualisointiBayesian inferencetilastomenetelmätComputer Science - Human-Computer Interactiontulkinta02 engineering and technologyBayesian inferenceluottamustasotHuman-Computer Interaction (cs.HC)cliff effectData visualizationhypothesis testing0202 electrical engineering electronic engineering information engineeringStatistical inferencevisualizationconfidence intervalsStatistical hypothesis testingpäättelybusiness.industrybayesilainen menetelmäOther Statistics (stat.OT)Multilevel model020207 software engineeringtilastografiikkaComputer Graphics and Computer-Aided DesignConfidence intervalStatistics - Other StatisticsSignal ProcessingComputer Vision and Pattern RecognitionbusinessPsychologyNull hypothesisValue (mathematics)SoftwareCognitive psychologystatistical inference
researchProduct

Estimating the causal effect of timing on the reach of social media posts

2022

AbstractModern companies regularly use social media to communicate with their customers. In addition to the content, the reach of a social media post may depend on the season, the day of the week, and the time of the day. We consider optimizing the timing of Facebook posts by a large Finnish consumers’ cooperative using historical data on previous posts and their reach. The content and the timing of the posts reflect the marketing strategy of the cooperative. These choices affect the reach of a post via a dynamic process where the reactions of users make the post more visible to others. We describe the causal relations of the social media publishing in the form of a directed acyclic graph, …

Statistics and ProbabilityFacebookoptimointibayesilainen menetelmäajoitus (suunnittelu)kausaliteettisosiaalinen mediaStatistics Probability and Uncertaintytilastolliset mallitmarkkinointiviestintäStatistical Methods & Applications
researchProduct

Estimating aggregated nutrient fluxes in four Finnish rivers via Gaussian state space models

2013

Reliable estimates of the nutrient fluxes carried by rivers from land-based sources to the sea are needed for efficient abatement of marine eutrophication. Although nutrient concentrations in rivers generally display large temporal variation, sampling and analysis for nutrients, unlike flow measurements, are rarely performed on a daily basis. The infrequent data calls for ways to reliably estimate the nutrient concentrations of the missing days. Here, we use the Gaussian state space models with daily water flow as a predictor variable to predict missing nutrient concentrations for four agriculturally impacted Finnish rivers. Via simulation of Gaussian state space models, we are able to esti…

Statistics and ProbabilityHydrologyWater flowEcological ModelingGaussianPhosphorusMonte Carlo methodSampling (statistics)chemistry.chemical_elementsymbols.namesakeNutrientchemistrysymbolsState spaceEnvironmental scienceEutrophicationEnvironmetrics
researchProduct

Graphical model inference : Sequential Monte Carlo meets deterministic approximations

2019

Approximate inference in probabilistic graphical models (PGMs) can be grouped into deterministic methods and Monte-Carlo-based methods. The former can often provide accurate and rapid inferences, but are typically associated with biases that are hard to quantify. The latter enjoy asymptotic consistency, but can suffer from high computational costs. In this paper we present a way of bridging the gap between deterministic and stochastic inference. Specifically, we suggest an efficient sequential Monte Carlo (SMC) algorithm for PGMs which can leverage the output from deterministic inference methods. While generally applicable, we show explicitly how this can be done with loopy belief propagati…

FOS: Computer and information sciencesComputer Science - Machine Learningkoneoppiminenmachine learningStatistics - Machine LearningMachine Learning (stat.ML)statistical modelstilastolliset mallitComputer Science::DatabasesMachine Learning (cs.LG)
researchProduct

Estimating aggregated nutrient fluxes in four Finnish rivers via Gaussian state space models

2013

Reliable estimates of the nutrient fluxes carried by rivers from land-based sources to the sea are needed for efficient abatement of marine eutrophication. Although nutrient concentrations in rivers generally display large temporal variation, sampling and analysis for nutrients, unlike flow measurements, are rarely performed on a daily basis. The infrequent data calls for ways to reliably estimate the nutrient concentrations of the missing days. Here, we use the Gaussian state space models with daily water flow as a predictor variable to predict missing nutrient concentrations for four agriculturally impacted Finnish rivers. Via simulation of Gaussian state space models, we are able to esti…

sparse dataharva aineistoPHOSPHORUS LOADOceanografi hydrologi och vattenresurserFINLANDKalmanin tasoitinsimulationSERIESinterpolationOceanography Hydrology and Water ResourcesKalmanin suodinKalman smootherSTREAMSsimulointiKalman filterinterpolointi
researchProduct

Analysing Complex Life Sequence Data with Hidden Markov Modelling

2016

When analysing complex sequence data with multiple channels (dimensions) and long observation sequences, describing and visualizing the data can be a challenge. Hidden Markov models (HMMs) and their mixtures (MHMMs) offer a probabilistic model-based framework where the information in such data can be compressed into hidden states (general life stages) and clusters (general patterns in life courses). We studied two different approaches to analysing clustered life sequence data with sequence analysis (SA) and hidden Markov modelling. In the first approach we used SA clusters as fixed and estimated HMMs separately for each group. In the second approach we treated SA clusters as suggestive and …

complex sequence dataHidden Markov Modelling
researchProduct

Estimation of causal effects with small data in the presence of trapdoor variables

2021

We consider the problem of estimating causal effects of interventions from observational data when well-known back-door and front-door adjustments are not applicable. We show that when an identifiable causal effect is subject to an implicit functional constraint that is not deducible from conditional independence relations, the estimator of the causal effect can exhibit bias in small samples. This bias is related to variables that we call trapdoor variables. We use simulated data to study different strategies to account for trapdoor variables and suggest how the related trapdoor bias might be minimized. The importance of trapdoor variables in causal effect estimation is illustrated with rea…

FOS: Computer and information sciencesStatistics and ProbabilityEconomics and EconometricsbiascausalityComputer scienceBayesian probabilityContext (language use)01 natural sciencesStatistics - ComputationMethodology (stat.ME)010104 statistics & probability0504 sociologyEconometrics0101 mathematicsComputation (stat.CO)Statistics - MethodologyestimointiEstimationSmall databayesilainen menetelmä05 social sciences050401 social sciences methodsEstimatorBayesian estimationidentifiabilityConstraint (information theory)functional constraintConditional independencekausaliteettiObservational studyStatistics Probability and UncertaintySocial Sciences (miscellaneous)
researchProduct

Prediction and interpolation of time series by state space models

2015

Artikkeliväitöskirja. Sisältää yhteenveto-osan ja neljä artikkelia. Article dissertation. Contains an introduction part and four articles. A large amount of data collected today is in the form of a time series. In order to make realistic inferences based on time series forecasts, in addition to point predictions, prediction intervals or other measures of uncertainty should be presented. Multiple sources of uncertainty are often ignored due to the complexities involved in accounting them correctly. In this dissertation, some of these problems are reviewed and some new solutions are presented. A state space approach is also advocated for an e cient and exible framework for time series forecas…

mallintaminenstate space modelsPrediction theoryaikasarjattila-avaruusmallitforecastingennusteetpredictionepävarmuusInterpolationaikasarja-analyysiR-kieliTime-series analysistime seriesuncertainty
researchProduct

Introducing libeemd: a program package for performing the ensemble empirical mode decomposition

2016

The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend. We present a free software implementation of EMD, EEMD and CEEMDAN and give an overview of the EMD methodology and the algorithms used in the deco…

Statistics and ProbabilityFOS: Computer and information sciences010504 meteorology & atmospheric sciencesComputer science0211 other engineering and technologies02 engineering and technology01 natural sciencesExtensibilityStatistics - ComputationHilbert–Huang transformSoftware implementationHilbert–Huang transformSannolikhetsteori och statistikTime seriesProbability Theory and StatisticsComputation (stat.CO)021101 geological & geomatics engineering0105 earth and related environmental sciencescomputer.programming_languagenoise-assisted data analysisintrinsic mode functionPython (programming language)adaptive data analysisComputational MathematicsNonlinear systemtime series analysisData analysisStatistics Probability and UncertaintyAlgorithmcomputerdetrendingHilbert-Huang transform; Intrinsic mode function; Time series analysis; Adaptive data analysis; Noise-assisted data analysis; Detrending
researchProduct

Efficient Bayesian generalized linear models with time-varying coefficients : The walker package in R

2020

The R package walker extends standard Bayesian general linear models to the case where the effects of the explanatory variables can vary in time. This allows, for example, to model the effects of interventions such as changes in tax policy which gradually increases their effect over time. The Markov chain Monte Carlo algorithms powering the Bayesian inference are based on Hamiltonian Monte Carlo provided by Stan software, using a state space representation of the model to marginalise over the regression coefficients for efficient low-dimensional sampling.

FOS: Computer and information sciencesaikasarjatbayesilainen menetelmäBayesian inferenceMarkovin ketjutRStatistics - Computationlineaariset mallitR-kieliMarkov chain Monte CarloMonte Carlo -menetelmätregressioanalyysiComputation (stat.CO)time-varying regression
researchProduct

A Bayesian spatio‐temporal analysis of markets during the Finnish 1860s famine

2022

We develop a Bayesian spatio-temporal model to study pre-industrial grain market integration during the Finnish famine of the 1860s. Our model takes into account several problematic features often present when analysing multiple spatially interdependent time series. For example, compared with the error correction methodology commonly applied in econometrics, our approach allows simultaneous modelling of multiple interdependent time series avoiding cumbersome statistical testing needed to predetermine the market leader as a point of reference. Furthermore, introducing a flexible spatio-temporal structure enables analysing detailed regional and temporal dynamics of the market mechanisms. Appl…

marketintegrationaikasarjatbayesilainen menetelmäerror correction modeltaloushistoriaBayesian statisticsaikasarja-analyysihintakehitysviljakauppamarkkinat (taloustiede)suuret nälkävuodetFinnish famineekonometriset mallitspatio-temporal model
researchProduct

A nonlinear mixed model approach to predict energy expenditure from heart rate.

2021

Abstract Objective. Heart rate (HR) monitoring provides a convenient and inexpensive way to predict energy expenditure (EE) during physical activity. However, there is a lot of variation among individuals in the EE-HR relationship, which should be taken into account in predictions. The objective is to develop a model that allows the prediction of EE based on HR as accurately as possible and allows an improvement of the prediction using calibration measurements from the target individual. Approach. We propose a nonlinear (logistic) mixed model for EE and HR measurements and an approach to calibrate the model for a new person who does not belong to the dataset used to estimate the model. The …

Mixed modelsykePhysiologyComputer science0206 medical engineeringindividual calibrationBiomedical EngineeringBiophysicsPhysical activityphysical activityheart rate monitoringModel parameters02 engineering and technologykalibrointilogistinen sekamallisykemittaus [energiankulutus]03 medical and health sciences0302 clinical medicineHeart RatePhysiology (medical)energy expenditureCalibrationHumanslogistic mixed modeltilastolliset mallitExerciseMonitoring PhysiologicHeterogeneous groupPrediction interval020601 biomedical engineeringmittausmenetelmätNonlinear systemEnergy expenditureExercise TestsykemittaritEnergy Metabolismfyysinen aktiivisuus.Algorithmfyysinen aktiivisuusenergiankulutus (aineenvaihdunta)030217 neurology & neurosurgeryPhysiological measurement
researchProduct

Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data

2018

Life course data often consists of multiple parallel sequences, one for each life domain of interest. Multichannel sequence analysis has been used for computing pairwise dissimilarities and finding clusters in this type of multichannel (or multidimensional) sequence data. Describing and visualizing such data is, however, often challenging. We propose an approach for compressing, interpreting, and visualizing the information within multichannel sequences by finding (1) groups of similar trajectories and (2) similar phases within trajectories belonging to the same group. For these tasks we combine multichannel sequence analysis and hidden Markov modelling. We illustrate this approach with an …

longitudinal datasekvensointisequence analysisSequence analysisComputer scienceMarkovin ketjutMarkov modelspitkittäistutkimuselämänkaari01 natural sciences010104 statistics & probability03 medical and health sciencesData sequencespopulation dynamicsSannolikhetsteori och statistik0101 mathematicsfamily and work trajectoriesProbability Theory and StatisticsHidden Markov modellife course030505 public healthhidden Markov modelslatent Markov modelsbusiness.industryPattern recognitionTvärvetenskapliga studier inom samhällsvetenskaplife sequence dataLife domainLife course approachPairwise comparisonArtificial intelligenceSocial Sciences Interdisciplinary0305 other medical sciencebusinessväestötilastot
researchProduct

Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo

2020

We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the sug…

Monte Carlo -menetelmätbayesilainen menetelmätilastomenetelmätMarkovin ketjutMarkov chain Monte Carlo (MCMC)Bayesian analysisotantaStatistics::Computationestimointi
researchProduct

bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R

2021

We present an R package bssm for Bayesian non-linear/non-Gaussian state space modelling. Unlike the existing packages, bssm allows for easy-to-use approximate inference based on Gaussian approximations such as the Laplace approximation and the extended Kalman filter. The package accommodates also discretely observed latent diffusion processes. The inference is based on fully automatic, adaptive Markov chain Monte Carlo (MCMC) on the hyperparameters, with optional importance sampling post-correction to eliminate any approximation bias. The package implements also a direct pseudo-marginal MCMC and a delayed acceptance pseudo-marginal MCMC using intermediate approximations. The package offers …

Statistics and ProbabilitymallintaminenFOS: Computer and information sciencesNumerical AnalysisMonte Carlo -menetelmätmatematiikkabayesilainen menetelmäMarkovin ketjuttila-avaruusmallitStatistics Probability and Uncertaintymatemaattiset mallitStatistics - ComputationComputation (stat.CO)
researchProduct

Improved Frequentist Prediction Intervals for Autoregressive Models by Simulation

2015

It is well known that the so called plug-in prediction intervals for autoregressive processes, with Gaussian disturbances, are too narrow, i.e. the coverage probabilities fall below the nominal ones. However, simulation experiments show that the formulas borrowed from the ordinary linear regression theory yield one-step prediction intervals, which have coverage probabilities very close to what is claimed. From a Bayesian point of view the resulting intervals are posterior predictive intervals when uniform priors are assumed for both autoregressive coefficients and logarithm of the disturbance variance. This finding opens the path how to treat multi-step prediction intervals which are obtain…

GaussianPrediction intervalsymbols.namesakeautoregressive modelsAutoregressive modelFrequentist inferenceprediction intervalsStatisticsCredible intervalEconometricssymbolssimulointiSTAR modelMathematics
researchProduct

Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo

2020

We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelisation and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the sug…

Statistics and ProbabilityHyperparameter05 social sciencesBayesian probabilityStrong consistencyEstimatorContext (language use)Markov chain Monte Carlo01 natural sciencesStatistics::Computation010104 statistics & probabilitysymbols.namesake0502 economics and businesssymbols0101 mathematicsStatistics Probability and UncertaintyParticle filterAlgorithmImportance sampling050205 econometrics MathematicsScandinavian Journal of Statistics
researchProduct

Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R

2019

Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariate…

FOS: Computer and information sciencesStatistics and ProbabilityMultivariate statisticssequence analysisaikasarjatComputer sciencerMarkov modelStatistics - ComputationStatistics - Applications01 natural sciencesUnobservablecategorical time seriesR-kieli010104 statistics & probabilitymulti-channel sequences; categorical time series; visualizing sequence data; visualizing models; latent Markov models; latent class models; RCovariateApplications (stat.AP)Sannolikhetsteori och statistikComputer software0101 mathematicsTime seriesProbability Theory and StatisticsHidden Markov modelCluster analysislcsh:Statisticslcsh:HA1-4737Categorical variableComputation (stat.CO)ta112business.industryvisualizing sequence dataR (programming languages)Pattern recognitionmulti-channel sequencesvisualizing modelslatent class modelssekvenssianalyysiArtificial intelligencelatent markov modelstime seriesStatistics Probability and UncertaintybusinessSoftwareJournal of Statistical Software
researchProduct

KFAS : Exponential Family State Space Models in R

2017

State space modelling is an efficient and flexible method for statistical inference of a broad class of time series and other data. This paper describes an R package KFAS for state space modelling with the observations from an exponential family, namely Gaussian, Poisson, binomial, negative binomial and gamma distributions. After introducing the basic theory behind Gaussian and non-Gaussian state space models, an illustrative example of Poisson time series forecasting is provided. Finally, a comparison to alternative R packages suitable for non-Gaussian time series modelling is presented.

FOS: Computer and information sciencesStatistics and ProbabilityaikasarjatGaussianNegative binomial distributionforecastingPoisson distribution01 natural sciencesStatistics - ComputationMethodology (stat.ME)010104 statistics & probability03 medical and health sciencessymbols.namesake0302 clinical medicineExponential familyexponential familyGamma distributionStatistical inferenceState spaceApplied mathematicsSannolikhetsteori och statistik030212 general & internal medicine0101 mathematicsProbability Theory and Statisticslcsh:Statisticslcsh:HA1-4737Computation (stat.CO)Statistics - MethodologyMathematicsR; exponential family; state space models; time series; forecasting; dynamic linear modelsta112state space modelsSeries (mathematics)RStatistics; Computer softwaresymbolsStatistics Probability and Uncertaintytime seriesSoftwaredynamic linear models
researchProduct

Improved frequentist prediction intervals for ARMA models by simulation

2014

[Introduction] In a traditional approach to time series forecasting, prediction intervals are usually computed as if the chosen model were correct and the parameters of the model completely known, with no reference to the uncertainty regarding the model selection and parameter estimation. The parameter uncertainty may not be a major source of prediction errors in practical applications, but its effects can be substantial if the series is not too long. The problems of interval prediction are discussed in depth in Chatfield (1993, 1996) and Clements & Hendry (1999). [Continues; please see the article] nonPeerReviewed

ARMA models
researchProduct

dynamite: An R Package for Dynamic Multivariate Panel Models

2023

dynamite is an R package for Bayesian inference of intensive panel (time series) data comprising of multiple measurements per multiple individuals measured in time. The package supports joint modeling of multiple response variables, time-varying and time-invariant effects, a wide range of discrete and continuous distributions, group-specific random effects, latent factors, and customization of prior distributions of the model parameters. Models in the package are defined via a user-friendly formula interface, and estimation of the posterior distribution of the model parameters takes advantage of state-of-the-art Markov chain Monte Carlo methods. The package enables efficient computation of …

Methodology (stat.ME)FOS: Computer and information sciencesStatistics - Methodology
researchProduct

A Bayesian Reconstruction of a Historical Population in Finland, 1647–1850

2020

This article provides a novel method for estimating historical population development. We review the previous literature on historical population time-series estimates and propose a general outline to address the well-known methodological problems. We use a Bayesian hierarchical time-series model that allows us to integrate the parish-level data set and prior population information in a coherent manner. The procedure provides us with model-based posterior intervals for the final population estimates. We demonstrate its applicability by estimating the long-term development of Finlands population from 1647 onward and simultaneously place the country among the very few to have an annual popula…

aikasarjatEconomics060106 history of social sciencesPopulation DynamicsBayesian probabilityPopulationPopulation developmentHistory 18th CenturyArticleHistory 17th CenturyPopulation estimateväestöhistoriaPopulation historyResidence Characteristics0502 economics and businessEconometricsPopulation growthHumansPopulation growth0601 history and archaeologyuuden ajan alkuNationalekonomi050207 economicsEarly modern eraeducationFinlandestimointiDemographyBayes estimatoreducation.field_of_studybayesilainen menetelmä05 social sciencesväestönmuutoksetBayes TheoremCensusesHistory 19th CenturyPopulation history; Population growth; Early modern era; Bayesian estimation06 humanities and the artsBayesian estimationData setGeographypopulation growthearly modern erapopulation historyDemography
researchProduct

From Sequences to Variables : Rethinking the Relationship between Sequences and Outcomes

2021

Sequence analysis is increasingly used in the social sciences for the holistic analysis of life-course and other longitudinal data. The usual approach is to construct sequences, calculate dissimilarities, group similar sequences with cluster analysis, and use cluster membership as a dependent or independent variable in a regression model. This approach may be problematic, as cluster memberships are assumed to be fixed known characteristics of the subjects in subsequent analyses. Furthermore, it is often more reasonable to assume that individual sequences are mixtures of multiple ideal types rather than equal members of some group. Failing to account for uncertain and mixed memberships may l…

sequence analysisrepresentativenesslife-courseSocArXiv|Social and Behavioral Sciences|Sociology|Children and Youthbepress|Social and Behavioral Sciences|SociologySocArXiv|Social and Behavioral Sciences|Sociologyklusteritbepress|Social and Behavioral Sciences|Sociology|Family Life Course and Societysekvenssianalyysianalyysibepress|Social and Behavioral SciencesklusterianalyysiSocArXiv|Social and Behavioral Sciencestypologycluster analysis
researchProduct