Search results for "feature"

showing 10 items of 4091 documents

High-frequency trading and networked markets

2021

Financial markets have undergone a deep reorganization during the last 20 y. A mixture of technological innovation and regulatory constraints has promoted the diffusion of market fragmentation and high-frequency trading. The new stock market has changed the traditional ecology of market participants and market professionals, and financial markets have evolved into complex sociotechnical institutions characterized by a great heterogeneity in the time scales of market members’ interactions that cover more than eight orders of magnitude. We analyze three different datasets for two highly studied market venues recorded in 2004 to 2006, 2010 to 2011, and 2018. Using methods of complex network th…

Statistically validated networks050208 financeMultidisciplinarySociotechnical systemFinancial markets05 social sciencesFinancial marketEvolutionary Models of Financial Markets Special FeatureComplex networksMonetary economicsComplex networkSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Market liquidity0502 economics and businessPortfolioStock marketBusiness050207 economicsHigh-frequency tradingHigh-frequency tradingStock (geology)Proceedings of the National Academy of Sciences
researchProduct

Spatio‐temporal classification in point patterns under the presence of clutter

2019

We consider the problem of detection of features in the presence of clutter for spatio-temporal point patterns. In previous studies, related to the spatial context, Kth nearest-neighbor distances to classify points between clutter and features. In particular, a mixture of distributions whose parameters were estimated using an expectation-maximization algorithm. This paper extends this methodology to the spatio-temporal context by considering the properties of the spatio-temporal Kth nearest-neighbor distances. For this purpose, we make use of a couple of spatio-temporal distances, which are based on the Euclidean and the maximum norms. We show close forms for the probability distributions o…

Statistics and Probability010504 meteorology & atmospheric sciencesComputer scienceComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONContext (language use)01 natural sciences010104 statistics & probabilitySpatio-temporalpoint patternsClutterExpectation–maximization algorithmEuclidean geometryEarthquakesPoint (geometry)clutter earthquakes EM algorithm features mixtures nearest‐neighbor distances spatio‐temporal point patterns0101 mathematicsEM algorithmFeatures0105 earth and related environmental sciencesspatio-temporal point patternSpatial contextual awarenessEcological Modelingmixturenearest-neighbor distanceComputingMethodologies_PATTERNRECOGNITIONearthquakeMixturesProbability distributionClutterfeatureSettore SECS-S/01 - StatisticaclutterNearest-neighbor distancesAlgorithmEnvironmetrics
researchProduct

Modeling Forest Tree Data Using Sequential Spatial Point Processes

2021

AbstractThe spatial structure of a forest stand is typically modeled by spatial point process models. Motivated by aerial forest inventories and forest dynamics in general, we propose a sequential spatial approach for modeling forest data. Such an approach is better justified than a static point process model in describing the long-term dependence among the spatial location of trees in a forest and the locations of detected trees in aerial forest inventories. Tree size can be used as a surrogate for the unknown tree age when determining the order in which trees have emerged or are observed on an aerial image. Sequential spatial point processes differ from spatial point processes in that the…

Statistics and Probability010504 meteorology & atmospheric scienceshistory-dependent modelpaikkatietoanalyysi01 natural sciencesPoint process010104 statistics & probabilityilmakuvakartoitusfunctional summary statisticsFeature (machine learning)spatial point processes0101 mathematicsmaximum likelihoodtilastolliset mallitAerial image0105 earth and related environmental sciencesGeneral Environmental ScienceForest dynamicsSpatial structureApplied Mathematics15. Life on landAgricultural and Biological Sciences (miscellaneous)Tree (graph theory)metsänarviointiData setEnvironmental sciencekaukokartoitusStatistics Probability and UncertaintyGeneral Agricultural and Biological SciencesPoint process modelsCartographyordered sequence
researchProduct

Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa ide…

2013

Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time-consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel…

Statistics and ProbabilityBayes' theoremEcological ModelingBayesian probabilityStatisticsPosterior probabilityFeature selectionContext (language use)Bayes classifierQuadratic classifierMathematicsRandom forestEnvironmetrics
researchProduct

A model-based approach to Spotify data analysis: a Beta GLMM

2020

Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…

Statistics and ProbabilityBeta GLMMDistribution (number theory)Computer scienceApplication Notes0211 other engineering and technologies02 engineering and technologycomputer.software_genreWeb API01 natural sciencesSet (abstract data type)010104 statistics & probabilitySpotify Web API audio features Popularity Index Beta GLMMsortSpotify Web API0101 mathematicsDigital audio021103 operations researchPoint (typography)Random effects modelData sciencePopularityIdentification (information)Popularity IndexData miningStatistics Probability and Uncertaintycomputeraudio feature
researchProduct

Automatic variable selection for exposure-driven propensity score matching with unmeasured confounders.

2020

Multivariable model building for propensity score modeling approaches is challenging. A common propensity score approach is exposure-driven propensity score matching, where the best model selection strategy is still unclear. In particular, the situation may require variable selection, while it is still unclear if variables included in the propensity score should be associated with the exposure and the outcome, with either the exposure or the outcome, with at least the exposure or with at least the outcome. Unmeasured confounders, complex correlation structures, and non-normal covariate distributions further complicate matters. We consider the performance of different modeling strategies in …

Statistics and ProbabilityBiometryModels StatisticalComputer scienceModel selectionFeature selectionGeneral MedicineVariance (accounting)01 natural sciencesOutcome (game theory)Correlation010104 statistics & probability03 medical and health sciencesAutomation0302 clinical medicineCovariatePropensity score matchingStatisticsMultivariate Analysis030212 general & internal medicine0101 mathematicsStatistics Probability and UncertaintyPropensity ScoreCounterexampleBiometrical journal. Biometrische ZeitschriftREFERENCES
researchProduct

Cluster-Localized Sparse Logistic Regression for SNP Data

2012

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…

Statistics and ProbabilityBoosting (machine learning)Computer scienceMultivariable calculusComputational BiologyHigh-Throughput Nucleotide SequencingFeature selectionRegression analysisModels TheoreticalLogistic regressioncomputer.software_genrePolymorphism Single NucleotideRegressionComputational MathematicsLogistic ModelsData Interpretation StatisticalGeneticsCluster AnalysisHumansData miningCluster analysisMolecular BiologyUnit-weighted regressioncomputerGenome-Wide Association StudyStatistical Applications in Genetics and Molecular Biology
researchProduct

Sample size planning for survival prediction with focus on high-dimensional data

2011

Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low-dimensional and high-dimensional explanatory variables. For dimension reduction in the high-dimensional setting, a variable selection step is inserted. If not all informative variables are included…

Statistics and ProbabilityClustering high-dimensional dataClinical Trials as TopicLung NeoplasmsModels StatisticalKaplan-Meier EstimateEpidemiologyProportional hazards modelDimensionality reductionGene ExpressionFeature selectionKaplan-Meier EstimateBiostatisticsPrognosisBrier scoreSample size determinationCarcinoma Non-Small-Cell LungSample SizeCensoring (clinical trials)StatisticsHumansProportional Hazards ModelsMathematicsStatistics in Medicine
researchProduct

Correlated randomness and switching phenomena

2010

One challenge of biology, medicine, and economics is that the systems treated by these serious scientific disciplines have no perfect metronome in time and no perfect spatial architecture—crystalline or otherwise. Nonetheless, as if by magic, out of nothing but randomness one finds remarkably fine-tuned processes in time and remarkably fine-tuned structures in space. Further, many of these processes and structures have the remarkable feature of “switching” from one behavior to another as if by magic. The past century has, philosophically, been concerned with placing aside the human tendency to see the universe as a fine-tuned machine. Here we will address the challenge of uncovering how, th…

Statistics and ProbabilityCognitive scienceTheoretical physicsAsideNothingPhenomenonFeature (machine learning)Magic (programming)Space (commercial competition)Condensed Matter PhysicsTipping point (sociology)RandomnessMathematicsPhysica A: Statistical Mechanics and its Applications
researchProduct

Binary distributions of concentric rings

2014

We introduce families of jointly symmetric, binary distributions that are generated over directed star graphs whose nodes represent variables and whose edges indicate positive dependences. The families are parametrized in terms of a single parameter. It is an outstanding feature of these distributions that joint probabilities relate to evenly spaced concentric rings. Kronecker product characterizations make them computationally attractive for a large number of variables. We study the behavior of different measures of dependence and derive maximum likelihood estimates when all nodes are observed and when the inner node is hidden.

Statistics and ProbabilityContingency tableKronecker productDiscrete mathematicsNumerical AnalysisBinary numberStar (graph theory)Combinatoricssymbols.namesakeConditional independenceJoint probability distributionsymbolsFeature (machine learning)Node (circuits)Statistics Probability and UncertaintyMathematicsJournal of Multivariate Analysis
researchProduct