Search results for "feature"
showing 10 items of 4091 documents
High-frequency trading and networked markets
2021
Financial markets have undergone a deep reorganization during the last 20 y. A mixture of technological innovation and regulatory constraints has promoted the diffusion of market fragmentation and high-frequency trading. The new stock market has changed the traditional ecology of market participants and market professionals, and financial markets have evolved into complex sociotechnical institutions characterized by a great heterogeneity in the time scales of market members’ interactions that cover more than eight orders of magnitude. We analyze three different datasets for two highly studied market venues recorded in 2004 to 2006, 2010 to 2011, and 2018. Using methods of complex network th…
Spatio‐temporal classification in point patterns under the presence of clutter
2019
We consider the problem of detection of features in the presence of clutter for spatio-temporal point patterns. In previous studies, related to the spatial context, Kth nearest-neighbor distances to classify points between clutter and features. In particular, a mixture of distributions whose parameters were estimated using an expectation-maximization algorithm. This paper extends this methodology to the spatio-temporal context by considering the properties of the spatio-temporal Kth nearest-neighbor distances. For this purpose, we make use of a couple of spatio-temporal distances, which are based on the Euclidean and the maximum norms. We show close forms for the probability distributions o…
Modeling Forest Tree Data Using Sequential Spatial Point Processes
2021
AbstractThe spatial structure of a forest stand is typically modeled by spatial point process models. Motivated by aerial forest inventories and forest dynamics in general, we propose a sequential spatial approach for modeling forest data. Such an approach is better justified than a static point process model in describing the long-term dependence among the spatial location of trees in a forest and the locations of detected trees in aerial forest inventories. Tree size can be used as a surrogate for the unknown tree age when determining the order in which trees have emerged or are observed on an aerial image. Sequential spatial point processes differ from spatial point processes in that the…
Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa ide…
2013
Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time-consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel…
A model-based approach to Spotify data analysis: a Beta GLMM
2020
Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…
Automatic variable selection for exposure-driven propensity score matching with unmeasured confounders.
2020
Multivariable model building for propensity score modeling approaches is challenging. A common propensity score approach is exposure-driven propensity score matching, where the best model selection strategy is still unclear. In particular, the situation may require variable selection, while it is still unclear if variables included in the propensity score should be associated with the exposure and the outcome, with either the exposure or the outcome, with at least the exposure or with at least the outcome. Unmeasured confounders, complex correlation structures, and non-normal covariate distributions further complicate matters. We consider the performance of different modeling strategies in …
Cluster-Localized Sparse Logistic Regression for SNP Data
2012
The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…
Sample size planning for survival prediction with focus on high-dimensional data
2011
Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low-dimensional and high-dimensional explanatory variables. For dimension reduction in the high-dimensional setting, a variable selection step is inserted. If not all informative variables are included…
Correlated randomness and switching phenomena
2010
One challenge of biology, medicine, and economics is that the systems treated by these serious scientific disciplines have no perfect metronome in time and no perfect spatial architecture—crystalline or otherwise. Nonetheless, as if by magic, out of nothing but randomness one finds remarkably fine-tuned processes in time and remarkably fine-tuned structures in space. Further, many of these processes and structures have the remarkable feature of “switching” from one behavior to another as if by magic. The past century has, philosophically, been concerned with placing aside the human tendency to see the universe as a fine-tuned machine. Here we will address the challenge of uncovering how, th…
Binary distributions of concentric rings
2014
We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly spaced concentric rings. Kronecker product characterizations make them computationally attractive for a large number of variables. We study the behavior of different measures of dependence and derive maximum likelihood estimates when all nodes are observed and when the inner node is hidden.