Dimension Estimation in Two-Dimensional PCA
We propose an automated way of determining the optimal number of low-rank components in dimension reduction of image data. The method is based on the combination of two-dimensional principal component analysis and an augmentation estimator proposed recently in the literature. Intuitively, the main idea is to combine a scree plot with information extracted from the eigenvectors of a variation matrix. Simulation studies show that the method provides accurate estimates and a demonstration with a finger data set showcases its performance in practice. peerReviewed
A more efficient second order blind identification method for separation of uncorrelated stationary time series
The classical second order source separation methods use approximate joint diagonalization of autocovariance matrices with several lags to estimate the unmixing matrix. Based on recent asymptotic results, we propose a novel unmixing matrix estimator which selects the best lag set from a finite set of candidate sets specified by the user. The theory is illustrated by a simulation study. peerReviewed
Applying fully tensorial ICA to fMRI data
There are two aspects in functional magnetic resonance imaging (fMRI) data that make them awkward to analyse with traditional multivariate methods - high order and high dimension. The first of these refers to the tensorial nature of observations as array-valued elements instead of vectors. Although this can be circumvented by vectorizing the array, doing so simultaneously loses all the structural information in the original observations. The second aspect refers to the high dimensionality along each dimension making the concept of dimension reduction a valuable tool in the processing of fMRI data. Different methods of tensor dimension reduction are currently gaining popUlarity in literature…
Large-sample properties of unsupervised estimation of the linear discriminant using projection pursuit
We study the estimation of the linear discriminant with projection pursuit, a method that is unsupervised in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis, and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), a supervised estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit gets arbitrarily close in efficiency to LDA when the…
TBSSvis: Visual Analytics for Temporal Blind Source Separation
Temporal Blind Source Separation (TBSS) is used to obtain the true underlying processes from noisy temporal multivariate data, such as electrocardiograms. TBSS has similarities to Principal Component Analysis (PCA) as it separates the input data into univariate components and is applicable to suitable datasets from various domains, such as medicine, finance, or civil engineering. Despite TBSS’s broad applicability, the involved tasks are not well supported in current tools, which offer only text-based interactions and single static images. Analysts are limited in analyzing and comparing obtained results, which consist of diverse data such as matrices and sets of time series. Additionally, p…
Stationary subspace analysis based on second-order statistics
In stationary subspace analysis (SSA) one assumes that the observable p-variate time series is a linear mixture of a k-variate nonstationary time series and a (p-k)-variate stationary time series. The aim is then to estimate the unmixing matrix which transforms the observed multivariate time series onto stationary and nonstationary components. In the classical approach multivariate data are projected onto stationary and nonstationary subspaces by minimizing a Kullback-Leibler divergence between Gaussian distributions, and the method only detects nonstationarities in the first two moments. In this paper we consider SSA in a more general multivariate time series setting and propose SSA method…
Fast equivariant JADE
Independent component analysis (ICA) is a widely used signal processing tool having applications in various fields of science. In this paper we focus on affine equivariant ICA methods. Two such well-established estimation methods, FOBI and JADE, diagonalize certain fourth order cumulant matrices to extract the independent components. FOBI uses one cumulant matrix only, and is therefore computationally very fast. However, it is not able to separate identically distributed components which is a major drawback. JADE overcomes this restriction. Unfortunately, JADE uses a huge number of cumulant matrices and is computationally very heavy in high-dimensional cases. In this paper, we hybridize the…
Deflation-Based FastICA With Adaptive Choices of Nonlinearities
Deflation-based FastICA is a popular method for independent component analysis. In the standard deflation-base d approach the row vectors of the unmixing matrix are extracted one after another always using the same nonlinearities. In prac- tice the user has to choose the nonlinearities and the efficiency and robustness of the estimation procedure then strongly depends on this choice as well as on the order in which the components are extracted. In this paper we propose a novel adaptive two- stage deflation-based FastICA algorithm that (i) allows one to use different nonlinearities for different components and (ii) optimizes the order in which the components are extracted. Based on a consist…
Asymptotic and bootstrap tests for subspace dimension
Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014) and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test…
Dimension reduction for time series in a blind source separation context using r
Funding Information: The work of KN was supported by the CRoNoS COST Action IC1408 and the Austrian Science Fund P31881-N32. The work of ST was supported by the CRoNoS COST Action IC1408. The work of JV was supported by Academy of Finland (grant 321883). We would like to thank the anonymous reviewers for their comments which improved the paper and package considerably. Publisher Copyright: © 2021, American Statistical Association. All rights reserved. Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first red…
Blind Source Separation Based on Joint Diagonalization in R: The Packages JADE and BSSasymp
Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based es…
Signal dimension estimation in BSS models with serial dependence
Many modern multivariate time series datasets contain a large amount of noise, and the first step of the data analysis is to separate the noise channels from the signals of interest. A crucial part of this dimension reduction is determining the number of signals. In this paper we approach this problem by considering a noisy latent variable time series model which comprises many popular blind source separation models. We propose a general framework for the estimation of the signal dimension that is based on testing for sub-sphericity and give examples of different tests suitable for time series settings. In the inference we rely on bootstrap null distributions. Several simulation studies are…
On Independent Component Analysis with Stochastic Volatility Models
Consider a multivariate time series where each component series is assumed to be a linear mixture of latent mutually independent stationary time series. Classical independent component analysis (ICA) tools, such as fastICA, are often used to extract latent series, but they don't utilize any information on temporal dependence. Also financial time series often have periods of low and high volatility. In such settings second order source separation methods, such as SOBI, fail. We review here some classical methods used for time series with stochastic volatility, and suggest modifications of them by proposing a family of vSOBI estimators. These estimators use different nonlinearity functions to…
fICA : FastICA Algorithms and Their Improved Variants
Abstract In independent component analysis (ICA) one searches for mutually independent non gaussian latent variables when the components of the multivariate data are assumed to be linear combinations of them. Arguably, the most popular method to perform ICA is FastICA. There are two classical versions, the deflation-based FastICA where the components are found one by one, and the symmetric FastICA where the components are found simultaneously. These methods have been implemented previously in two R packages, fastICA and ica. We present the R package fICA and compare it to the other packages. Additional features in fICA include optimization of the extraction order in the deflation-based vers…
Blind source separation for non-stationary random fields
Regional data analysis is concerned with the analysis and modeling of measurements that are spatially separated by specifically accounting for typical features of such data. Namely, measurements in close proximity tend to be more similar than the ones further separated. This might hold also true for cross-dependencies when multivariate spatial data is considered. Often, scientists are interested in linear transformations of such data which are easy to interpret and might be used as dimension reduction. Recently, for that purpose spatial blind source separation (SBSS) was introduced which assumes that the observed data are formed by a linear mixture of uncorrelated, weakly stationary random …
Deflation-based separation of uncorrelated stationary time series
In this paper we assume that the observed pp time series are linear combinations of pp latent uncorrelated weakly stationary time series. The problem is then to find an estimate for an unmixing matrix that transforms the observed time series back to uncorrelated time series. The so called SOBI (Second Order Blind Identification) estimate aims at a joint diagonalization of the covariance matrix and several autocovariance matrices with varying lags. In this paper, we propose a novel procedure that extracts the latent time series one by one. The limiting distribution of this deflation-based SOBI is found under general conditions, and we show how the results can be used for the comparison of es…
Publication and Coauthorship Networks of Hannu Oja
In this paper we review Hannu Oja’s publications and form coauthor networks based on them. Applying community detection methods to the network formed by all of Hannu’s publications shows that his coauthors can be classified into 13 clusters, where two large clusters refer to his methodological research. The network concerning this methodological work is then extended to cover all statistical publications written by Hannu’s coauthors. The analysis of the extended network shows that Hannu’s coauthors do not form a closed community, but Hannu is involved in many different fields of statistics.
A review of second‐order blind identification methods
Second order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modelling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sig…
Separation of Uncorrelated Stationary time series using Autocovariance Matrices
Blind source separation (BSS) is a signal processing tool, which is widely used in various fields. Examples include biomedical signal separation, brain imaging and economic time series applications. In BSS, one assumes that the observed $p$ time series are linear combinations of $p$ latent uncorrelated weakly stationary time series. The aim is then to find an estimate for an unmixing matrix, which transforms the observed time series back to uncorrelated latent time series. In SOBI (Second Order Blind Identification) joint diagonalization of the covariance matrix and autocovariance matrices with several lags is used to estimate the unmixing matrix. The rows of an unmixing matrix can be deriv…
A review of second‐order blind identification methods
Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source sign…
The “Seili-index” For The Prediction of Chlorophyll-α Levels In The Archipelago Sea of The Northern Baltic Sea, Southwest Finland
AbstractTo build a forecasting tool for the state of eutrophication in the Archipelago Sea, we fitted a Generalized Additive Mixed Model (GAMM) to marine environmental monitoring data, which were collected over the years 2011–2019 by an automated profiling buoy at the Seili ODAS-station. The resulting “Seili-index” can be used to predict the chlorophyll-α (chl-a) concentration in the seawater a number of days ahead by using the temperature forecast as a covariate. An array of test predictions with two separate models on the 2019 data set showed that the index is adept at predicting the amount of chl-a especially in the upper water layer. The visualization with 10 days of chl-a level predict…
Singular Spectrum Analysis
Statistical properties of a blind source separation estimator for stationary time series
Abstract In this paper, we assume that the observed p time series are linear combinations of p latent uncorrelated weakly stationary time series. The problem is then, using the observed p -variate time series, to find an estimate for a mixing or unmixing matrix for the combinations. The estimated uncorrelated time series may then have nice interpretations and can be used in a further analysis. The popular AMUSE algorithm finds an estimate of an unmixing matrix using covariances and autocovariances of the observed time series. In this paper, we derive the limiting distribution of the AMUSE estimator under general conditions, and show how the results can be used for the comparison of estimate…
On the Computation of Symmetrized M-Estimators of Scatter
This paper focuses on the computational aspects of symmetrized Mestimators of scatter, i.e. the multivariate M-estimators of scatter computed on the pairwise differences of the data. Such estimators do not require a location estimate, and more importantly, they possess the important block and joint independence properties. These properties are needed, for example, when solving the independent component analysis problem. Classical and recently developed algorithms for computing the M-estimators and the symmetrized M-estimators are discussed. The effect of parallelization is considered as well as new computational approach based on using only a subset of pairwise differences. Efficiencies and…
On the usage of joint diagonalization in multivariate statistics
Scatter matrices generalize the covariance matrix and are useful in many multivariate data analysis methods, including well-known principal component analysis (PCA), which is based on the diagonalization of the covariance matrix. The simultaneous diagonalization of two or more scatter matrices goes beyond PCA and is used more and more often. In this paper, we offer an overview of many methods that are based on a joint diagonalization. These methods range from the unsupervised context with invariant coordinate selection and blind source separation, which includes independent component analysis, to the supervised context with discriminant analysis and sliced inverse regression. They also enco…
A more efficient second order blind identification method for separation of uncorrelated stationary time series
The classical second order source separation methods use approximate joint diagonalization of autocovariance matrices with several lags to estimate the unmixing matrix. Based on recent asymptotic results, we propose a novel unmixing matrix estimator which selects the best lag set from a finite set of candidate sets specified by the user. The theory is illustrated by a simulation study.
Test of the Latent Dimension of a Spatial Blind Source Separation Model
We assume a spatial blind source separation model in which the observed multivariate spatial data is a linear mixture of latent spatially uncorrelated random fields containing a number of pure white noise components. We propose a test on the number of white noise components and obtain the asymptotic distribution of its statistic for a general domain. We also demonstrate how computations can be facilitated in the case of gridded observation locations. Based on this test, we obtain a consistent estimator of the true dimension. Simulation studies and an environmental application in the Supplemental Material demonstrate that our test is at least comparable to and often outperforms bootstrap-bas…
Model selection using limiting distributions of second-order blind source separation algorithms
Signals, recorded over time, are often observed as mixtures of multiple source signals. To extract relevant information from such measurements one needs to determine the mixing coefficients. In case of weakly stationary time series with uncorrelated source signals, this separation can be achieved by jointly diagonalizing sample autocovariances at different lags, and several algorithms address this task. Often the mixing estimates contain close-to-zero entries and one wants to decide whether the corresponding source signals have a relevant impact on the observations or not. To address this question of model selection we consider the recently published second-order blind identification proced…
Blind recovery of sources for multivariate space-time random fields
AbstractWith advances in modern worlds technology, huge datasets that show dependencies in space as well as in time occur frequently in practice. As an example, several monitoring stations at different geographical locations track hourly concentration measurements of a number of air pollutants for several years. Such a dataset contains thousands of multivariate observations, thus, proper statistical analysis needs to account for dependencies in space and time between and among the different monitored variables. To simplify the consequent multivariate spatio-temporal statistical analysis it might be of interest to detect linear transformations of the original observations that result in stra…
Fourth Moments and Independent Component Analysis
In independent component analysis it is assumed that the components of the observed random vector are linear combinations of latent independent random variables, and the aim is then to find an estimate for a transformation matrix back to these independent components. In the engineering literature, there are several traditional estimation procedures based on the use of fourth moments, such as FOBI (fourth order blind identification), JADE (joint approximate diagonalization of eigenmatrices), and FastICA, but the statistical properties of these estimates are not well known. In this paper various independent component functionals based on the fourth moments are discussed in detail, starting wi…
The squared symmetric FastICA estimator
In this paper we study the theoretical properties of the deflation-based FastICA method, the original symmetric FastICA method, and a modified symmetric FastICA method, here called the squared symmetric FastICA. This modification is obtained by replacing the absolute values in the FastICA objective function by their squares. In the deflation-based case this replacement has no effect on the estimate since the maximization problem stays the same. However, in the symmetric case we obtain a different estimate which has been mentioned in the literature, but its theoretical properties have not been studied at all. In the paper we review the classic deflation-based and symmetric FastICA approaches…
Large-Sample Properties of Blind Estimation of the Linear Discriminant Using Projection Pursuit
We study the estimation of the linear discriminant with projection pursuit, a method that is blind in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), an unblind estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit is able to achieve efficiency equal to LDA when the groups are…
Extracting Conditionally Heteroskedastic Components using Independent Component Analysis
In the independent component model, the multivariate data are assumed to be a mixture of mutually independent latent components. The independent component analysis (ICA) then aims at estimating these latent components. In this article, we study an ICA method which combines the use of linear and quadratic autocorrelations to enable efficient estimation of various kinds of stationary time series. Statistical properties of the estimator are studied by finding its limiting distribution under general conditions, and the asymptotic variances are derived in the case of ARMA-GARCH model. We use the asymptotic results and a finite sample simulation study to compare different choices of a weight coef…
KernelICA : Kernel Independent Component Analysis
The kernel independent component analysis (kernel ICA) method introduced by Bach and Jordan (2003) . The incomplete Cholesky decomposition used in kernel ICA is provided as separate function. nonPeerReviewed