0000000000073763
AUTHOR
Hannu Oja
Positive experiences and the relationship between stress and asthma in children
Ninety children aged 6 to 13 y and suffering from chronic asthma were included in a prospective follow-up study lasting 18 mo in order to assess whether life events involving substantial positive effects on the child can protect against the increased risk associated with stressful life events. The main outcome measures included positive life events, positive long-term experiences, severely negative life events, chronic psychosocial stress and new asthma exacerbation. The results showed that, provided they occurred in close proximity to severely negative life events, positive life events, generally related to the child's own achievements, afforded protection against the increased risk of a n…
Sign and Rank Covariance Matrices: Statistical Properties and Application to Principal Components Analysis
In this paper, the estimation of covariance matrices based on multivariate sign and rank vectors is discussed. Equivariance and robustness properties of the sign and rank covariance matrices are described. We show their use for the principal components analysis (PCA) problem. Limiting efficiencies of the estimation procedures for PCA are compared.
Multivariate nonparametric tests in a randomized complete block design
AbstractIn this paper multivariate extensions of the Friedman and Page tests for the comparison of several treatments are introduced. Related unadjusted and adjusted treatment effect estimates for the multivariate response variable are also found and their properties discussed. The test statistics and estimates are analogous to the traditional univariate methods. In test constructions, the univariate ranks are replaced by multivariate spatial ranks (J. Nonparam. Statist. 5 (1995) 201). Asymptotic theory is developed to provide approximations for the limiting distributions of the test statistics and estimates. Limiting efficiencies of the tests and treatment effect estimates are found in the…
Affine Invariant Multivariate Sign and Rank Tests and Corresponding Estimates: a Review
The paper reviews recent contributions to the statistical inference methods, tests and estimates, based on the generalized median of Oja. Multivariate analogues of sign and rank concepts, affine invariant one-sample and two-sample sign tests and rank tests, affine equivariant median and Hodges–Lehmann-type estimates are reviewed and discussed. Some comparisons are made to other generalizations. The theory is illustrated by two examples.
The role of acute and chronic stress in asthma attacks in children.
Background: High levels of stress have been shown to predict the onset of asthma in children genetically at risk, and to correlate with higher asthma morbidity. Our study set out to examine whether stressful experiences actually provoke new exacerbations in children who already have asthma.Methods: A group of child patients with verified chronic asthma were prospectively followed up for 18 months. We used continuous monitoring of asthma by the use of diaries and daily peak-flow values, accompanied by repeated interview assessments of life events and long-term psychosocial experiences. The key measures included asthma exacerbations, severely negative life events, and chronic stressors.Findin…
k-Step shape estimators based on spatial signs and ranks
In this paper, the shape matrix estimators based on spatial sign and rank vectors are considered. The estimators considered here are slight modifications of the estimators introduced in Dümbgen (1998) and Oja and Randles (2004) and further studied for example in Sirkiä et al. (2009). The shape estimators are computed using pairwise differences of the observed data, therefore there is no need to estimate the location center of the data. When the estimator is based on signs, the use of differences also implies that the estimators have the so called independence property if the estimator, that is used as an initial estimator, has it. The influence functions and limiting distributions of the es…
Nonparametric statistics for DOA estimation in the presence of multipath
This paper is concerned with array signal processing in nonGaussian noise and in the presence of multipath. Robust and fully nonparametric high resolution algorithms for direction of arrival (DOA) estimation are presented. The algorithms are based on multivariate spatial sign and rank concepts. Spatial smoothing of the multivariate rank and sign based covariance matrices is employed as a preprocessing step in order to deal with coherent sources. The performance of the algorithms is studied using simulations. The results show that almost optimal performance is obtained in wide variety of different noise conditions.
Robustifying principal component analysis with spatial sign vectors
Abstract In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods.
Multivariate Nonparametric Tests
Multivariate nonparametric statistical tests of hypotheses are described for the one-sample location problem, the several-sample location problem and the problem of testing independence between pairs of vectors. These methods are based on affine-invariant spatial sign and spatial rank vectors. They provide affine-invariant multivariate generalizations of the univariate sign test, signed-rank test, Wilcoxon rank sum test, Kruskal–Wallis test, and the Kendall and Spearman correlation tests. While the emphasis is on tests of hypotheses, certain references to associated affine-equivariant estimators are included. Pitman asymptotic efficiencies demonstrate the excellent performance of these meth…
Independent component analysis based on symmetrised scatter matrices
A new method for separating the mixtures of independent sources has been proposed recently in [Oja et al. (2006). Scatter matrices and independent component analysis. Austrian J. Statist., to appear]. This method is based on two scatter matrices with the so-called independence property. The corresponding method is now further examined. Simple simulation studies are used to compare the performance of so-called symmetrised scatter matrices in solving the independence component analysis problem. The results are also compared with the classical FastICA method. Finally, the theory is illustrated by some examples. peerReviewed
Fast equivariant JADE
Independent component analysis (ICA) is a widely used signal processing tool having applications in various fields of science. In this paper we focus on affine equivariant ICA methods. Two such well-established estimation methods, FOBI and JADE, diagonalize certain fourth order cumulant matrices to extract the independent components. FOBI uses one cumulant matrix only, and is therefore computationally very fast. However, it is not able to separate identically distributed components which is a major drawback. JADE overcomes this restriction. Unfortunately, JADE uses a huge number of cumulant matrices and is computationally very heavy in high-dimensional cases. In this paper, we hybridize the…
Estimates of Regression Coefficients Based on the Sign Covariance Matrix
SummaryA new estimator of the regression parameters is introduced in a multivariate multiple-regression model in which both the vector of explanatory variables and the vector of response variables are assumed to be random. The affine equivariant estimate matrix is constructed using the sign covariance matrix (SCM) where the sign concept is based on Oja's criterion function. The influence function and asymptotic theory are developed to consider robustness and limiting efficiencies of the SCM regression estimate. The estimate is shown to be consistent with a limiting multinormal distribution. The influence function, as a function of the length of the contamination vector, is shown to be linea…
Deflation-Based FastICA With Adaptive Choices of Nonlinearities
Deflation-based FastICA is a popular method for independent component analysis. In the standard deflation-base d approach the row vectors of the unmixing matrix are extracted one after another always using the same nonlinearities. In prac- tice the user has to choose the nonlinearities and the efficiency and robustness of the estimation procedure then strongly depends on this choice as well as on the order in which the components are extracted. In this paper we propose a novel adaptive two- stage deflation-based FastICA algorithm that (i) allows one to use different nonlinearities for different components and (ii) optimizes the order in which the components are extracted. Based on a consist…
Inference based on the affine invariant multivariate Mann–Whitney–Wilcoxon statistic
A new affine invariant multivariate analogue of the two-sample Mann–Whitney–Wilcoxon test based on the Oja criterion function is introduced. The associated affine equivariant estimate of shift, the multivariate Hodges-Lehmann estimate, is also considered. Asymptotic theory is developed to provide approximations for null distribution as well as for a sequence of contiguous alternatives to consider limiting efficiencies of the test and estimate. The theory is illustrated by an example. Hettmansperger et al. [9] considered alternative slightly different affine invariant extensions also based on the Oja criterion. The methods proposed in this paper are computationally more intensive, but surpri…
Robust subspace DOA estimation for wireless communications
This paper is concerned with array signal processing in non-Gaussian noise typical in urban and indoor radio channels. Robust and fully nonparametric high resolution algorithms for direction of arrival (DOA) estimation are presented. The algorithms are based on multivariate spatial sign and rank concepts. The performance of the algorithms is studied using simulations. The results show that almost optimal performance is obtained in wide variety of noise conditions.
Symmetrised M-estimators of multivariate scatter
AbstractIn this paper we introduce a family of symmetrised M-estimators of multivariate scatter. These are defined to be M-estimators only computed on pairwise differences of the observed multivariate data. Symmetrised Huber's M-estimator and Dümbgen's estimator serve as our examples. The influence functions of the symmetrised M-functionals are derived and the limiting distributions of the estimators are discussed in the multivariate elliptical case to consider the robustness and efficiency properties of estimators. The symmetrised M-estimators have the important independence property; they can therefore be used to find the independent components in the independent component analysis (ICA).
Asymptotic and bootstrap tests for subspace dimension
Most linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices, see e.g. Ye and Weiss (2003), Tyler et al. (2009), Bura and Yang (2011), Liski et al. (2014) and Luo and Li (2016). The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test…
The asymptotic covariance matrix of the Oja median
The Oja median, based on a sample of multivariate data, is an affine equivariant estimate of the centre of the distribution. It reduces to the sample median in one dimension and has several nice robustness and efficiency properties. We develop different representations of its asymptotic variance and discuss ways to estimate this quantity. We consider symmetric multivariate models and also the more narrow elliptical models. A small simulation study is included to compare finite sample results to the asymptotic formulas.
Multivariate nonparametric tests of independence
New test statistics are proposed for testing whether two random vectors are independent. Gieser and Randles, as well as Taskinen, Kankainen, and Oja have introduced and discussed multivariate extensions of the quadrant test of Blomqvist. This article serves as a sequel to this work and presents new multivariate extensions of Kendall's tau and Spearman's rho statistics. Two different approaches are discussed. First, interdirection proportions are used to estimate the cosines of angles between centered observation vectors and between differences of observation vectors. Second, covariances between affine-equivariant multivariate signs and ranks are used. The test statistics arising from these …
Tests of multinormality based on location vectors and scatter matrices
Classical univariate measures of asymmetry such as Pearson’s (mean-median)/σ or (mean-mode)/σ often measure the standardized distance between two separate location parameters and have been widely used in assessing univariate normality. Similarly, measures of univariate kurtosis are often just ratios of two scale measures. The classical standardized fourth moment and the ratio of the mean deviation to the standard deviation serve as examples. In this paper we consider tests of multinormality which are based on the Mahalanobis distance between two multivariate location vector estimates or on the (matrix) distance between two scatter matrix estimates, respectively. Asymptotic theory is develop…
Early developmental milestones in adult schizophrenia and other psychoses. A 31-year follow-up of the Northern Finland 1966 Birth Cohort
Abstract Delayed childhood development may precede adult psychoses. We tested this hypothesis in a large, general population birth cohort (n=12 058) followed to age 31 years. The ages at which individuals learned to stand, walk, speak, and became potty-trained (bowel control) and dry (bladder control), were recorded at a 1-year examination. Psychiatric outcome was ascertained through linkage to a national hospital discharge register. Cumulative incidence of DSM-III-R schizophrenia, other psychoses and non-psychotic disorders were stratified according to the timing of milestones and compared within the cohort using internal standardization. 100 cases of DSM-III-R schizophrenia, 55 other psyc…
On Independent Component Analysis with Stochastic Volatility Models
Consider a multivariate time series where each component series is assumed to be a linear mixture of latent mutually independent stationary time series. Classical independent component analysis (ICA) tools, such as fastICA, are often used to extract latent series, but they don't utilize any information on temporal dependence. Also financial time series often have periods of low and high volatility. In such settings second order source separation methods, such as SOBI, fail. We review here some classical methods used for time series with stochastic volatility, and suggest modifications of them by proposing a family of vSOBI estimators. These estimators use different nonlinearity functions to…
Deflation-based separation of uncorrelated stationary time series
In this paper we assume that the observed pp time series are linear combinations of pp latent uncorrelated weakly stationary time series. The problem is then to find an estimate for an unmixing matrix that transforms the observed time series back to uncorrelated time series. The so called SOBI (Second Order Blind Identification) estimate aims at a joint diagonalization of the covariance matrix and several autocovariance matrices with varying lags. In this paper, we propose a novel procedure that extracts the latent time series one by one. The limiting distribution of this deflation-based SOBI is found under general conditions, and we show how the results can be used for the comparison of es…
Optimal signed-rank tests based on hyperplanes
Abstract For analysing k -variate data sets, Randles (J. Amer. Statist. Assoc. 84 (1989) 1045) considered hyperplanes going through k - 1 data points and the origin. He then introduced an empirical angular distance between two k -variate data vectors based on the number of hyperplanes (the so-called interdirections ) that separate these two points, and proposed a multivariate sign test based on those interdirections. In this paper, we present an analogous concept (namely, lift-interdirections ) to measure the regular distances between data points. The empirical distance between two k -variate data vectors is again determined by the number of hyperplanes that separate these two points; in th…
Separation of Uncorrelated Stationary time series using Autocovariance Matrices
Blind source separation (BSS) is a signal processing tool, which is widely used in various fields. Examples include biomedical signal separation, brain imaging and economic time series applications. In BSS, one assumes that the observed $p$ time series are linear combinations of $p$ latent uncorrelated weakly stationary time series. The aim is then to find an estimate for an unmixing matrix, which transforms the observed time series back to uncorrelated latent time series. In SOBI (Second Order Blind Identification) joint diagonalization of the covariance matrix and autocovariance matrices with several lags is used to estimate the unmixing matrix. The rows of an unmixing matrix can be deriv…
Affine-invariant rank tests for multivariate independence in independent component models
We consider the problem of testing for multivariate independence in independent component (IC) models. Under a symmetry assumption, we develop parametric and nonparametric (signed-rank) tests. Unlike in independent component analysis (ICA), we allow for the singular cases involving more than one Gaussian independent component. The proposed rank tests are based on componentwise signed ranks, à la Puri and Sen. Unlike the Puri and Sen tests, however, our tests (i) are affine-invariant and (ii) are, for adequately chosen scores, locally and asymptotically optimal (in the Le Cam sense) at prespecified densities. Asymptotic local powers and asymptotic relative efficiencies with respect to Wilks’…
Tests of Independence Based on Sign and Rank Covariances
In this paper three different concepts of bivariate sign and rank, namely marginal sign and rank, spatial sign and rank and affine equivariant sign and rank, are considered. The aim is to see whether these different sign and rank covariances can be used to construct tests for the hypothesis of independence. In some cases (spatial sign, affine equivariant sign and rank) an additional assumption on the symmetry of marginal distribution is needed. Limiting distributions of test statistics under the null hypothesis as well as under interesting sequences of contiguous alternatives are derived. Asymptotic relative efficiencies with respect to the regular correlation test are calculated and compar…
Influence Functions and Efficiencies of k-Step Hettmansperger–Randles Estimators for Multivariate Location and Regression
In Hettmansperger and Randles (Biometrika 89:851–860, 2002) spatial sign vectors were used to derive simultaneous estimators of multivariate location and shape. Oja (Multivariate nonparametric methods with R. Springer, New York, 2010) proposed a similar approach for the multivariate linear regression case. These estimators are highly robust and have under general assumptions a joint limiting multinormal distribution. The estimates are easy to compute using fixed-point algorithms. There are however no exact proofs for the convergence of these algorithms. The existence and uniqueness of the solutions also still remain unproven although we believe that they hold under general conditions. To ci…
Computation of the Multivariate Oja Median
The multivariate Oja median (Oja, 1983) is an affine equivariant multivariate location estimate with high efficiency. This estimate has a bounded influence function but zero breakdown. The computation of the estimate appears to be highly intensive. We consider different, exact and stochastic, algorithms for the calculation of the value of the estimate. In the stochastic algorithms, the gradient of the objective function, the rank function, is estimated by sampling observation. hyperplanes. The estimated rank function with its estimated accuracy then yields a confidence region for the true sample Oja median, and the confidence region shrinks to the sample median with the increasing number of…
Tests and estimates of shape based on spatial signs and ranks
Nonparametric procedures for testing and estimation of the shape matrix in the case of multivariate elliptic distribution are considered. Testing for sphericity is an important special case. The tests and estimates are based on the spatial sign and rank covariance matrices. The estimates based on the spatial sign covariance matrix and symmetrized spatial sign covariance matrix are Tyler's [A distribution-free M-estimator of multivariate scatter, Ann. Statist. 15 (1987), pp. 234–251] shape matrix and and Dümbgen's [On Tyler's M-functional of scatter in high dimension, Ann. Inst. Statist. Math. 50 (1998), pp. 471–491] shape matrix, respectively. The test based on the spatial sign covariance m…
The affine equivariant sign covariance matrix: asymptotic behavior and efficiencies
We consider the affine equivariant sign covariance matrix (SCM) introduced by Visuri et al. (J. Statist. Plann. Inference 91 (2000) 557). The population SCM is shown to be proportional to the inverse of the regular covariance matrix. The eigenvectors and standardized eigenvalues of the covariance, matrix can thus be derived from the SCM. We also construct an estimate of the covariance and correlation matrix based on the SCM. The influence functions and limiting distributions of the SCM and its eigenvectors and eigenvalues are found. Limiting efficiencies are given in multivariate normal and t-distribution cases. The estimates are highly efficient in the multivariate normal case and perform …
Statistical properties of a blind source separation estimator for stationary time series
Abstract In this paper, we assume that the observed p time series are linear combinations of p latent uncorrelated weakly stationary time series. The problem is then, using the observed p -variate time series, to find an estimate for a mixing or unmixing matrix for the combinations. The estimated uncorrelated time series may then have nice interpretations and can be used in a further analysis. The popular AMUSE algorithm finds an estimate of an unmixing matrix using covariances and autocovariances of the observed time series. In this paper, we derive the limiting distribution of the AMUSE estimator under general conditions, and show how the results can be used for the comparison of estimate…
On Mardia’s Tests of Multinormality
Classical multivariate analysis is based on the assumption that the data come from a multivariate normal distribution. The tests of multinormality have therefore received very much attention. Several tests for assessing multinormality, among them Mardia’s popular multivariate skewness and kurtosis statistics, are based on standardized third and fourth moments. In Mardia’s construction of the affine invariant test statistics, the data vectors are first standardized using the sample mean vector and the sample covariance matrix. In this paper we investigate whether, in the test construction, it is advantageous to replace the regular sample mean vector and sample covariance matrix by their affi…
Influence functions and efficiencies of the canonical correlation and vector estimates based on scatter and shape matrices
In this paper, the influence functions and limiting distributions of the canonical correlations and coefficients based on affine equivariant scatter matrices are developed for elliptically symmetric distributions. General formulas for limiting variances and covariances of the canonical correlations and canonical vectors based on scatter matrices are obtained. Also the use of the so-called shape matrices in canonical analysis is investigated. The scatter and shape matrices based on the affine equivariant Sign Covariance Matrix as well as the Tyler's shape matrix serve as examples. Their finite sample and limiting efficiencies are compared to those of the Minimum Covariance Determinant estima…
Sign test of independence between two random vectors
A new affine invariant extension of the quadrant test statistic Blomqvist (Ann. Math. Statist. 21 (1950) 593) based on spatial signs is proposed for testing the hypothesis of independence. In the elliptic case, the new test statistic is asymptotically equivalent to the interdirection test by Gieser and Randles (J. Amer. Statist. Assoc. 92 (1997) 561) but is easier to compute in practice. Limiting Pitman efficiencies and simulations are used to compare the test to the classical Wilks’ test. peerReviewed
Robust nonparametric statistical methods. Thomas P. Hettmansperger and Joseph McKean, Arnold/Wiley, London/New York, 1998. No. of pages: xi+467. Price £45. ISBN 0-340-54937-8 (Arnold) and 0-471-19479-4 (Wiley)
On Mardia's tests of multinormality
Robustifying principal component analysis with spatial sign vectors
In this paper, we apply orthogonally equivariant spatial sign covariance matrices as well as their affine equivariant counterparts in principal component analysis. The influence functions and asymptotic covariance matrices of eigenvectors based on robust covariance estimators are derived in order to compare the robustness and efficiency properties. We show in particular that the estimators that use pairwise differences of the observed data have very good efficiency properties, providing practical robust alternatives to classical sample covariance matrix based methods. peerReviewed
Rank scores tests of multivariate independence
New rank scores test statistics are proposed for testing whether two random vectors are independent. The tests are asymptotically distribution-free for elliptically symmetric marginal distributions. Recently, Gieser and Randles (1997), Taskinen, Kankainen and Oja (2003) and Taskinen, Oja and Randles (2005) introduced and discussed different multivariate extensions of the quadrant test, Kendall's tau and Spearman's rho statistics. In this paper, standardized multivariate spatial signs and the (univariate) ranks of the Mahalanobis-type distances of the observations from the origin are combined to construct ranks cores tests of independence. The limiting distributions of the test statistics ar…
On the Efficiency of Affine Invariant Multivariate Rank Tests
AbstractIn this paper the asymptotic Pitman efficiencies of the affine invariant multivariate analogues of the rank tests based on the generalized median of Oja are considered. Formulae for asymptotic relative efficiencies are found and, under multivariate normal and multivariatetdistributions, relative efficiencies with respect to Hotelling'sT2test are calculated.
Model selection using limiting distributions of second-order blind source separation algorithms
Signals, recorded over time, are often observed as mixtures of multiple source signals. To extract relevant information from such measurements one needs to determine the mixing coefficients. In case of weakly stationary time series with uncorrelated source signals, this separation can be achieved by jointly diagonalizing sample autocovariances at different lags, and several algorithms address this task. Often the mixing estimates contain close-to-zero entries and one wants to decide whether the corresponding source signals have a relevant impact on the observations or not. To address this question of model selection we consider the recently published second-order blind identification proced…
Affine equivariant multivariate rank methods
The classical multivariate statistical methods (MANOVA, principal component analysis, multivariate multiple regression, canonical correlation, factor analysis, etc.) assume that the data come from a multivariate normal distribution and the derivations are based on the sample covariance matrix. The conventional sample covariance matrix and consequently the standard multivariate techniques based on it are, however, highly sensitive to outlying observations. In the paper a new, more robust and highly efficient, approach based on an affine equivariant rank covariance matrix is proposed and outlined. Affine equivariant multivariate rank concept is based on the multivariate Oja (Statist. Probab. …
Fourth Moments and Independent Component Analysis
In independent component analysis it is assumed that the components of the observed random vector are linear combinations of latent independent random variables, and the aim is then to find an estimate for a transformation matrix back to these independent components. In the engineering literature, there are several traditional estimation procedures based on the use of fourth moments, such as FOBI (fourth order blind identification), JADE (joint approximate diagonalization of eigenmatrices), and FastICA, but the statistical properties of these estimates are not well known. In this paper various independent component functionals based on the fourth moments are discussed in detail, starting wi…
The squared symmetric FastICA estimator
In this paper we study the theoretical properties of the deflation-based FastICA method, the original symmetric FastICA method, and a modified symmetric FastICA method, here called the squared symmetric FastICA. This modification is obtained by replacing the absolute values in the FastICA objective function by their squares. In the deflation-based case this replacement has no effect on the estimate since the maximization problem stays the same. However, in the symmetric case we obtain a different estimate which has been mentioned in the literature, but its theoretical properties have not been studied at all. In the paper we review the classic deflation-based and symmetric FastICA approaches…
Sign and rank covariance matrices
The robust estimation of multivariate location and shape is one of the most challenging problems in statistics and crucial in many application areas. The objective is to find highly efficient, robust, computable and affine equivariant location and covariance matrix estimates. In this paper, three different concepts of multivariate sign and rank are considered and their ability to carry information about the geometry of the underlying distribution (or data cloud) are discussed. New techniques for robust covariance matrix estimation based on different sign and rank concepts are proposed and algorithms for computing them outlined. In addition, new tools for evaluating the qualitative and quant…