0000000000471214

AUTHOR

Sara Taskinen

Efficient estimation of generalized linear latent variable models.

Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estim…

research product

Fast and universal estimation of latent variable models using extended variational approximations

AbstractGeneralized linear latent variable models (GLLVMs) are a class of methods for analyzing multi-response data which has gained considerable popularity in recent years, e.g., in the analysis of multivariate abundance data in ecology. One of the main features of GLLVMs is their capacity to handle a variety of responses types, such as (overdispersed) counts, binomial and (semi-)continuous responses, and proportions data. On the other hand, the inclusion of unobserved latent variables poses a major computational challenge, as the resulting marginal likelihood function involves an intractable integral for non-normally distributed responses. This has spurred research into a number of approx…

research product

Model‐based approaches to unconstrained ordination

Summary Unconstrained ordination is commonly used in ecology to visualize multivariate data, in particular, to visualize the main trends between different sites in terms of their species composition or relative abundance. Methods of unconstrained ordination currently used, such as non-metric multidimensional scaling, are algorithm-based techniques developed and implemented without directly accommodating the statistical properties of the data at hand. Failure to account for these key data properties can lead to misleading results. A model-based approach to unconstrained ordination can address this issue, and in this study, two types of models for ordination are proposed based on finite mixtu…

research product

k-Step shape estimators based on spatial signs and ranks

In this paper, the shape matrix estimators based on spatial sign and rank vectors are considered. The estimators considered here are slight modifications of the estimators introduced in Dümbgen (1998) and Oja and Randles (2004) and further studied for example in Sirkiä et al. (2009). The shape estimators are computed using pairwise differences of the observed data, therefore there is no need to estimate the location center of the data. When the estimator is based on signs, the use of differences also implies that the estimators have the so called independence property if the estimator, that is used as an initial estimator, has it. The influence functions and limiting distributions of the es…

research product

A more efficient second order blind identification method for separation of uncorrelated stationary time series

The classical second order source separation methods use approximate joint diagonalization of autocovariance matrices with several lags to estimate the unmixing matrix. Based on recent asymptotic results, we propose a novel unmixing matrix estimator which selects the best lag set from a finite set of candidate sets specified by the user. The theory is illustrated by a simulation study. peerReviewed

research product

Robustifying principal component analysis with spatial sign vectors

Abstract In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods.

research product

Applying fully tensorial ICA to fMRI data

There are two aspects in functional magnetic resonance imaging (fMRI) data that make them awkward to analyse with traditional multivariate methods - high order and high dimension. The first of these refers to the tensorial nature of observations as array-valued elements instead of vectors. Although this can be circumvented by vectorizing the array, doing so simultaneously loses all the structural information in the original observations. The second aspect refers to the high dimensionality along each dimension making the concept of dimension reduction a valuable tool in the processing of fMRI data. Different methods of tensor dimension reduction are currently gaining popUlarity in literature…

research product

Independent component analysis based on symmetrised scatter matrices

A new method for separating the mixtures of independent sources has been proposed recently in [Oja et al. (2006). Scatter matrices and independent component analysis. Austrian J. Statist., to appear]. This method is based on two scatter matrices with the so-called independence property. The corresponding method is now further examined. Simple simulation studies are used to compare the performance of so-called symmetrised scatter matrices in solving the independence component analysis problem. The results are also compared with the classical FastICA method. Finally, the theory is illustrated by some examples. peerReviewed

research product

Analyzing environmental‐trait interactions in ecological communities with fourth‐corner latent variable models

In ecological community studies it is often of interest to study the effect of species related trait variables on abundances or presence-absences. Specifically, the interest may lay in the interactions between environmental and trait variables. An increasingly popular approach for studying such interactions is to use the so-called fourth-corner model, which explicitly posits a regression model where the mean response of each species is a function of interactions between covariate and trait predictors (among other terms). On the other hand, many of the fourth-corner models currently applied in the literature are too simplistic to properly account for variation in environmental and trait resp…

research product

Stationary subspace analysis based on second-order statistics

In stationary subspace analysis (SSA) one assumes that the observable p-variate time series is a linear mixture of a k-variate nonstationary time series and a (p-k)-variate stationary time series. The aim is then to estimate the unmixing matrix which transforms the observed multivariate time series onto stationary and nonstationary components. In the classical approach multivariate data are projected onto stationary and nonstationary subspaces by minimizing a Kullback-Leibler divergence between Gaussian distributions, and the method only detects nonstationarities in the first two moments. In this paper we consider SSA in a more general multivariate time series setting and propose SSA method…

research product

On nonparametric tests of independence and robust canonical correlation analysis

research product

Fast equivariant JADE

Independent component analysis (ICA) is a widely used signal processing tool having applications in various fields of science. In this paper we focus on affine equivariant ICA methods. Two such well-established estimation methods, FOBI and JADE, diagonalize certain fourth order cumulant matrices to extract the independent components. FOBI uses one cumulant matrix only, and is therefore computationally very fast. However, it is not able to separate identically distributed components which is a major drawback. JADE overcomes this restriction. Unfortunately, JADE uses a huge number of cumulant matrices and is computationally very heavy in high-dimensional cases. In this paper, we hybridize the…

research product

Deflation-Based FastICA With Adaptive Choices of Nonlinearities

Deflation-based FastICA is a popular method for independent component analysis. In the standard deflation-base d approach the row vectors of the unmixing matrix are extracted one after another always using the same nonlinearities. In prac- tice the user has to choose the nonlinearities and the efficiency and robustness of the estimation procedure then strongly depends on this choice as well as on the order in which the components are extracted. In this paper we propose a novel adaptive two- stage deflation-based FastICA algorithm that (i) allows one to use different nonlinearities for different components and (ii) optimizes the order in which the components are extracted. Based on a consist…

research product

Extending Joint Models in Community Ecology : A Response to Beissinger et al.

The joint modelling of many variables in community ecology is a new and technically challenging area with many opportunities for future developments. The possibility of extending joint models to deal with imperfect detection has been highlighted by Beissinger et al. as an important problem worthy of further investigation [1]. We agree, and previously pointed to this potential extension as an outstanding question [2], alongside models that can estimate phylogenetic repulsion or attraction, nonlinearity in the response to latent variables, and spatial or temporal correlation, because further developments in all these directions are needed.

research product

Symmetrised M-estimators of multivariate scatter

AbstractIn this paper we introduce a family of symmetrised M-estimators of multivariate scatter. These are defined to be M-estimators only computed on pairwise differences of the observed multivariate data. Symmetrised Huber's M-estimator and Dümbgen's estimator serve as our examples. The influence functions of the symmetrised M-functionals are derived and the limiting distributions of the estimators are discussed in the multivariate elliptical case to consider the robustness and efficiency properties of estimators. The symmetrised M-estimators have the important independence property; they can therefore be used to find the independent components in the independent component analysis (ICA).

research product

Dimension reduction for time series in a blind source separation context using r

Funding Information: The work of KN was supported by the CRoNoS COST Action IC1408 and the Austrian Science Fund P31881-N32. The work of ST was supported by the CRoNoS COST Action IC1408. The work of JV was supported by Academy of Finland (grant 321883). We would like to thank the anonymous reviewers for their comments which improved the paper and package considerably. Publisher Copyright: © 2021, American Statistical Association. All rights reserved. Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first red…

research product

Metabolic health, menopause, and physical activity : a 4-year follow-up study

Background In women, metabolic health deteriorates after menopause, and the role of physical activity (PA) in mitigating the change is not completely understood. This study investigates the changes in indicators of metabolic health around menopause and evaluates whether PA modulates these changes. Methods Longitudinal data of 298 women aged 48–55 years at baseline participating in the ERMA and EsmiRs studies was used. Mean follow-up time was 3.8 (SD 0.1) years. Studied indicators of metabolic health were total and android fat mass, waist circumference, waist-to-hip ratio (WHR), systolic (SBP) and diastolic (DBP) blood pressure, blood glucose, triglycerides, serum total cholesterol, and high…

research product

Blind Source Separation Based on Joint Diagonalization in R: The Packages JADE and BSSasymp

Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based es…

research product

smatr 3 - an R package for estimation and inference about allometric lines

Summary 1. The Standardised Major Axis Tests and Routines (SMATR) software provides tools for estimation and inference about allometric lines, currently widely used in ecology and evolution. 2. This paper describes some significant improvements to the functionality of the package, now available on R in smatr version 3. 3. New inclusions in the package include sma and ma functions that accept formula input and perform the key inference tasks; multiple comparisons; graphical methods for visualising data and checking (S)MA assumptions; robust (S)MA estimation and inference tools.

research product

gllvm: Fast analysis of multivariate abundance data with generalized linear latent variable models inr

The work of J.N. was supported by the Wihuri Foundation. The work of S.T. was supported by the CRoNoS COST Action IC1408.F.K.C.H. was also supported by an ANU cross disciplinary grant.

research product

Multivariate nonparametric tests of independence

New test statistics are proposed for testing whether two random vectors are independent. Gieser and Randles, as well as Taskinen, Kankainen, and Oja have introduced and discussed multivariate extensions of the quadrant test of Blomqvist. This article serves as a sequel to this work and presents new multivariate extensions of Kendall's tau and Spearman's rho statistics. Two different approaches are discussed. First, interdirection proportions are used to estimate the cosines of angles between centered observation vectors and between differences of observation vectors. Second, covariances between affine-equivariant multivariate signs and ranks are used. The test statistics arising from these …

research product

Tests of multinormality based on location vectors and scatter matrices

Classical univariate measures of asymmetry such as Pearson’s (mean-median)/σ or (mean-mode)/σ often measure the standardized distance between two separate location parameters and have been widely used in assessing univariate normality. Similarly, measures of univariate kurtosis are often just ratios of two scale measures. The classical standardized fourth moment and the ratio of the mean deviation to the standard deviation serve as examples. In this paper we consider tests of multinormality which are based on the Mahalanobis distance between two multivariate location vector estimates or on the (matrix) distance between two scatter matrix estimates, respectively. Asymptotic theory is develop…

research product

Signal dimension estimation in BSS models with serial dependence

Many modern multivariate time series datasets contain a large amount of noise, and the first step of the data analysis is to separate the noise channels from the signals of interest. A crucial part of this dimension reduction is determining the number of signals. In this paper we approach this problem by considering a noisy latent variable time series model which comprises many popular blind source separation models. We propose a general framework for the estimation of the signal dimension that is based on testing for sub-sphericity and give examples of different tests suitable for time series settings. In the inference we rely on bootstrap null distributions. Several simulation studies are…

research product

On Independent Component Analysis with Stochastic Volatility Models

Consider a multivariate time series where each component series is assumed to be a linear mixture of latent mutually independent stationary time series. Classical independent component analysis (ICA) tools, such as fastICA, are often used to extract latent series, but they don't utilize any information on temporal dependence. Also financial time series often have periods of low and high volatility. In such settings second order source separation methods, such as SOBI, fail. We review here some classical methods used for time series with stochastic volatility, and suggest modifications of them by proposing a family of vSOBI estimators. These estimators use different nonlinearity functions to…

research product

fICA : FastICA Algorithms and Their Improved Variants

Abstract In independent component analysis (ICA) one searches for mutually independent non gaussian latent variables when the components of the multivariate data are assumed to be linear combinations of them. Arguably, the most popular method to perform ICA is FastICA. There are two classical versions, the deflation-based FastICA where the components are found one by one, and the symmetric FastICA where the components are found simultaneously. These methods have been implemented previously in two R packages, fastICA and ica. We present the R package fICA and compare it to the other packages. Additional features in fICA include optimization of the extraction order in the deflation-based vers…

research product

Deflation-based separation of uncorrelated stationary time series

In this paper we assume that the observed pp time series are linear combinations of pp latent uncorrelated weakly stationary time series. The problem is then to find an estimate for an unmixing matrix that transforms the observed time series back to uncorrelated time series. The so called SOBI (Second Order Blind Identification) estimate aims at a joint diagonalization of the covariance matrix and several autocovariance matrices with varying lags. In this paper, we propose a novel procedure that extracts the latent time series one by one. The limiting distribution of this deflation-based SOBI is found under general conditions, and we show how the results can be used for the comparison of es…

research product

Publication and Coauthorship Networks of Hannu Oja

In this paper we review Hannu Oja’s publications and form coauthor networks based on them. Applying community detection methods to the network formed by all of Hannu’s publications shows that his coauthors can be classified into 13 clusters, where two large clusters refer to his methodological research. The network concerning this methodological work is then extended to cover all statistical publications written by Hannu’s coauthors. The analysis of the extended network shows that Hannu’s coauthors do not form a closed community, but Hannu is involved in many different fields of statistics.

research product

gllvm : Fast analysis of multivariate abundance data with generalized linear latent variable models in R

1.There has been rapid development in tools for multivariate analysis based on fully specified statistical models or “joint models”. One approach attracting a lot of attention is generalized linear latent variable models (GLLVMs). However, software for fitting these models is typically slow and not practical for large datsets. 2.The R package gllvm offers relatively fast methods to fit GLLVMs via maximum likelihood, along with tools for model checking, visualization and inference. 3.The main advantage of the package over other implementations is speed e.g. being two orders of magnitude faster, and capable of handling thousands of response variables. These advances come from using variationa…

research product

A review of second‐order blind identification methods

Second order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modelling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sig…

research product

Separation of Uncorrelated Stationary time series using Autocovariance Matrices

Blind source separation (BSS) is a signal processing tool, which is widely used in various fields. Examples include biomedical signal separation, brain imaging and economic time series applications. In BSS, one assumes that the observed $p$ time series are linear combinations of $p$ latent uncorrelated weakly stationary time series. The aim is then to find an estimate for an unmixing matrix, which transforms the observed time series back to uncorrelated latent time series. In SOBI (Second Order Blind Identification) joint diagonalization of the covariance matrix and autocovariance matrices with several lags is used to estimate the unmixing matrix. The rows of an unmixing matrix can be deriv…

research product

Fall incidence in frail older women after individualized visual feedback-based balance training.

<i>Background:</i> The knowledge concerning balance training actually lowering fall rates among frail older persons is limited. <i>Objective:</i> The aim of this study was to examine the effects of a 4-week individualized visual feedback-based balance training on the fall incidence during 1-year follow-up among frail older women living in residential care. <i>Methods:</i> Twenty-seven older women from 2 residential care homes were randomized into exercise (n = 20) and control (n = 7) groups. Balance measurements were carried out before and after a 4-week training period and falls were monitored by monthly diaries for 1 year. An interview about fear of fal…

research product

Affine-invariant rank tests for multivariate independence in independent component models

We consider the problem of testing for multivariate independence in independent component (IC) models. Under a symmetry assumption, we develop parametric and nonparametric (signed-rank) tests. Unlike in independent component analysis (ICA), we allow for the singular cases involving more than one Gaussian independent component. The proposed rank tests are based on componentwise signed ranks, à la Puri and Sen. Unlike the Puri and Sen tests, however, our tests (i) are affine-invariant and (ii) are, for adequately chosen scores, locally and asymptotically optimal (in the Le Cam sense) at prespecified densities. Asymptotic local powers and asymptotic relative efficiencies with respect to Wilks’…

research product

Tests of Independence Based on Sign and Rank Covariances

In this paper three different concepts of bivariate sign and rank, namely marginal sign and rank, spatial sign and rank and affine equivariant sign and rank, are considered. The aim is to see whether these different sign and rank covariances can be used to construct tests for the hypothesis of independence. In some cases (spatial sign, affine equivariant sign and rank) an additional assumption on the symmetry of marginal distribution is needed. Limiting distributions of test statistics under the null hypothesis as well as under interesting sequences of contiguous alternatives are derived. Asymptotic relative efficiencies with respect to the regular correlation test are calculated and compar…

research product

Influence Functions and Efficiencies of k-Step Hettmansperger–Randles Estimators for Multivariate Location and Regression

In Hettmansperger and Randles (Biometrika 89:851–860, 2002) spatial sign vectors were used to derive simultaneous estimators of multivariate location and shape. Oja (Multivariate nonparametric methods with R. Springer, New York, 2010) proposed a similar approach for the multivariate linear regression case. These estimators are highly robust and have under general assumptions a joint limiting multinormal distribution. The estimates are easy to compute using fixed-point algorithms. There are however no exact proofs for the convergence of these algorithms. The existence and uniqueness of the solutions also still remain unproven although we believe that they hold under general conditions. To ci…

research product

A review of second‐order blind identification methods

Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sign…

research product

Tests and estimates of shape based on spatial signs and ranks

Nonparametric procedures for testing and estimation of the shape matrix in the case of multivariate elliptic distribution are considered. Testing for sphericity is an important special case. The tests and estimates are based on the spatial sign and rank covariance matrices. The estimates based on the spatial sign covariance matrix and symmetrized spatial sign covariance matrix are Tyler's [A distribution-free M-estimator of multivariate scatter, Ann. Statist. 15 (1987), pp. 234–251] shape matrix and and Dümbgen's [On Tyler's M-functional of scatter in high dimension, Ann. Inst. Statist. Math. 50 (1998), pp. 471–491] shape matrix, respectively. The test based on the spatial sign covariance m…

research product

Singular Spectrum Analysis

research product

Statistical properties of a blind source separation estimator for stationary time series

Abstract In this paper, we assume that the observed p time series are linear combinations of p latent uncorrelated weakly stationary time series. The problem is then, using the observed p -variate time series, to find an estimate for a mixing or unmixing matrix for the combinations. The estimated uncorrelated time series may then have nice interpretations and can be used in a further analysis. The popular AMUSE algorithm finds an estimate of an unmixing matrix using covariances and autocovariances of the observed time series. In this paper, we derive the limiting distribution of the AMUSE estimator under general conditions, and show how the results can be used for the comparison of estimate…

research product

So Many Variables: Joint Modeling in Community Ecology

Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by exa…

research product

On the Computation of Symmetrized M-Estimators of Scatter

This paper focuses on the computational aspects of symmetrized Mestimators of scatter, i.e. the multivariate M-estimators of scatter computed on the pairwise differences of the data. Such estimators do not require a location estimate, and more importantly, they possess the important block and joint independence properties. These properties are needed, for example, when solving the independent component analysis problem. Classical and recently developed algorithms for computing the M-estimators and the symmetrized M-estimators are discussed. The effect of parallelization is considered as well as new computational approach based on using only a subset of pairwise differences. Efficiencies and…

research product

On Mardia’s Tests of Multinormality

Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution. The tests of multinormality have therefore received very much attention. Several tests for assessing multinormality, among them Mardia’s popular multivariate skewness and kurtosis statistics, are based on standardized third and fourth moments. In Mardia’s construction of the affine invariant test statistics, the data vectors are first standardized using the sample mean vector and the sample covariance matrix. In this paper we investigate whether, in the test construction, it is advantageous to replace the regular sample mean vector and sample covariance matrix by their affi…

research product

A more efficient second order blind identification method for separation of uncorrelated stationary time series

The classical second order source separation methods use approximate joint diagonalization of autocovariance matrices with several lags to estimate the unmixing matrix. Based on recent asymptotic results, we propose a novel unmixing matrix estimator which selects the best lag set from a finite set of candidate sets specified by the user. The theory is illustrated by a simulation study.

research product

Robust estimation and inference for bivariate line-fitting in allometry.

In allometry, bivariate techniques related to principal component analysis are often used in place of linear regression, and primary interest is in making inferences about the slope. We demonstrate that the current inferential methods are not robust to bivariate contamination, and consider four robust alternatives to the current methods -- a novel sandwich estimator approach, using robust covariance matrices derived via an influence function approach, Huber's M-estimator and the fast-and-robust bootstrap. Simulations demonstrate that Huber's M-estimators are highly efficient and robust against bivariate contamination, and when combined with the fast-and-robust bootstrap, we can make accurat…

research product

Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices

In this paper, the influence functions and limiting distributions of the canonical correlations and coefficients based on affine equivariant scatter matrices are developed for elliptically symmetric distributions. General formulas for limiting variances and covariances of the canonical correlations and canonical vectors based on scatter matrices are obtained. Also the use of the so-called shape matrices in canonical analysis is investigated. The scatter and shape matrices based on the affine equivariant Sign Covariance Matrix as well as the Tyler's shape matrix serve as examples. Their finite sample and limiting efficiencies are compared to those of the Minimum Covariance Determinant estima…

research product

Sign test of independence between two random vectors

A new affine invariant extension of the quadrant test statistic Blomqvist (Ann. Math. Statist. 21 (1950) 593) based on spatial signs is proposed for testing the hypothesis of independence. In the elliptic case, the new test statistic is asymptotically equivalent to the interdirection test by Gieser and Randles (J. Amer. Statist. Assoc. 92 (1997) 561) but is easier to compute in practice. Limiting Pitman efficiencies and simulations are used to compare the test to the classical Wilks’ test. peerReviewed

research product

On Mardia's tests of multinormality

research product

Robustifying principal component analysis with spatial sign vectors

In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods. peerReviewed

research product

Rank scores tests of multivariate independence

New rank scores test statistics are proposed for testing whether two random vectors are independent. The tests are asymptotically distribution-free for elliptically symmetric marginal distributions. Recently, Gieser and Randles (1997), Taskinen, Kankainen and Oja (2003) and Taskinen, Oja and Randles (2005) introduced and discussed different multivariate extensions of the quadrant test, Kendall's tau and Spearman's rho statistics. In this paper, standardized multivariate spatial signs and the (univariate) ranks of the Mahalanobis-type distances of the observations from the origin are combined to construct ranks cores tests of independence. The limiting distributions of the test statistics ar…

research product

What information should I look for again? : Attentional difficulties distracts reading of task assignments

This large-scale eye-movement study (N = 164) investigated how students read short task assignments to complete information search problems and how their cognitive resources are associated with this reading behavior. These cognitive resources include information searching subskills, prior knowledge, verbal memory, reading fluency, and attentional difficulties. In this study, the task assignments consisted of four sentences. The first and last sentences provided context, while the second or third sentence was the relevant or irrelevant sentence under investigation. The results of a linear mixed-model and latent change score analyses showed the ubiquitous influence of reading fluency on first…

research product

Model selection using limiting distributions of second-order blind source separation algorithms

Signals, recorded over time, are often observed as mixtures of multiple source signals. To extract relevant information from such measurements one needs to determine the mixing coefficients. In case of weakly stationary time series with uncorrelated source signals, this separation can be achieved by jointly diagonalizing sample autocovariances at different lags, and several algorithms address this task. Often the mixing estimates contain close-to-zero entries and one wants to decide whether the corresponding source signals have a relevant impact on the observations or not. To address this question of model selection we consider the recently published second-order blind identification proced…

research product

Fourth Moments and Independent Component Analysis

In independent component analysis it is assumed that the components of the observed random vector are linear combinations of latent independent random variables, and the aim is then to find an estimate for a transformation matrix back to these independent components. In the engineering literature, there are several traditional estimation procedures based on the use of fourth moments, such as FOBI (fourth order blind identification), JADE (joint approximate diagonalization of eigenmatrices), and FastICA, but the statistical properties of these estimates are not well known. In this paper various independent component functionals based on the fourth moments are discussed in detail, starting wi…

research product

The squared symmetric FastICA estimator

In this paper we study the theoretical properties of the deflation-based FastICA method, the original symmetric FastICA method, and a modified symmetric FastICA method, here called the squared symmetric FastICA. This modification is obtained by replacing the absolute values in the FastICA objective function by their squares. In the deflation-based case this replacement has no effect on the estimate since the maximization problem stays the same. However, in the symmetric case we obtain a different estimate which has been mentioned in the literature, but its theoretical properties have not been studied at all. In the paper we review the classic deflation-based and symmetric FastICA approaches…

research product

Variational Approximations for Generalized Linear Latent Variable Models

Generalized linear latent variable models (GLLVMs) are a powerful class of models for understanding the relationships among multiple, correlated responses. Estimation, however, presents a major challenge, as the marginal likelihood does not possess a closed form for nonnormal responses. We propose a variational approximation (VA) method for estimating GLLVMs. For the common cases of binary, ordinal, and overdispersed count data, we derive fully closed-form approximations to the marginal log-likelihood function in each case. Compared to other methods such as the expectation-maximization algorithm, estimation using VA is fast and straightforward to implement. Predictions of the latent variabl…

research product

ICA and stochastic volatility models

We consider multivariate time series where each component series is an unknown linear combination of latent mutually independent stationary time series. Multivariate financial time series have often periods of low volatility followed by periods of high volatility. This kind of time series have typically non-Gaussian stationary distributions, and therefore standard independent component analysis (ICA) tools such as fastICA can be used to extract independent component series even though they do not utilize any information on temporal dependence. In this paper we review some ICA methods used in the context of stochastic volatility models. We also suggest their modifications which use nonlinear…

research product

Testate amoebae community analysis as a tool to assess biological impacts of peatland use

As most ecosystems, peatlands have been heavily exploited for different human purposes. For example, in Finland the majority is under forestry, agriculture or peat mining use. Peatlands play an important role in carbon storage, water cycle, and are a unique habitat for rare organisms. Such properties highlight their environmental importance and the need for their restoration. To monitor the success of peatland restoration sensitive indicators are needed. Here we test whether testate amoebae can be used as a reliable bioindicator for assessing peatland condition. To qualify as reliable indicators, responses in testate amoebae community structure to ecological changes must be stronger than ra…

research product

Extracting Conditionally Heteroskedastic Components using Independent Component Analysis

In the independent component model, the multivariate data are assumed to be a mixture of mutually independent latent components. The independent component analysis (ICA) then aims at estimating these latent components. In this article, we study an ICA method which combines the use of linear and quadratic autocorrelations to enable efficient estimation of various kinds of stationary time series. Statistical properties of the estimator are studied by finding its limiting distribution under general conditions, and the asymptotic variances are derived in the case of ARMA-GARCH model. We use the asymptotic results and a finite sample simulation study to compare different choices of a weight coef…

research product

The effects of grazing history, soil properties and stand structure on the communities of saprotrophic fungi in wood-pastures

Wood-pastures are threatened anthropogenic biotopes that provide habitat for an extensive group of species. Here we studied the effect of management, grazing intensity, time since abandonment, historical land-use intensity, soil properties and stand conditions on communities of saprotrophic fungi in wood-pastures in Central Finland. We found that the proportion of broadleaved trees and soil pH are the major drivers in the communities of saprotrophic fungi in these boreal wood-pastures. In addition, tree species richness, soil moisture, historical land-use intensity and time since abandonment affected the communities of saprotrophic fungi. Current management or grazing intensity did not have…

research product

Variational Approximations for Generalized Linear Latent Variable Models

Generalized linear latent variable models (GLLVMs) are a powerful class of models for understanding the relationships among multiple, correlated responses. Estimation, however, presents a major challenge, as the marginal likelihood does not possess a closed form for nonnormal responses. We propose a variational approximation (VA) method for estimating GLLVMs. For the common cases of binary, ordinal, and overdispersed count data, we derive fully closed-form approximations to the marginal log-likelihood function in each case. Compared to other methods such as the expectation-maximization algorithm, estimation using VA is fast and straightforward to implement. Predictions of the latent variabl…

research product