Search results for "Mining"

showing 10 items of 1730 documents

A model-based approach to Spotify data analysis: a Beta GLMM

2020

Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model, including random effects, is proposed in order to give the first answer to questions like: which are the determinants of popularity? The identification of a model able to describe this relationship, the determination within the set of char…

Statistics and ProbabilityBeta GLMMDistribution (number theory)Computer scienceApplication Notes0211 other engineering and technologies02 engineering and technologycomputer.software_genreWeb API01 natural sciencesSet (abstract data type)010104 statistics & probabilitySpotify Web API audio features Popularity Index Beta GLMMsortSpotify Web API0101 mathematicsDigital audio021103 operations researchPoint (typography)Random effects modelData sciencePopularityIdentification (information)Popularity IndexData miningStatistics Probability and Uncertaintycomputeraudio feature

researchProduct

Cluster-Localized Sparse Logistic Regression for SNP Data

2012

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…

Statistics and ProbabilityBoosting (machine learning)Computer scienceMultivariable calculusComputational BiologyHigh-Throughput Nucleotide SequencingFeature selectionRegression analysisModels TheoreticalLogistic regressioncomputer.software_genrePolymorphism Single NucleotideRegressionComputational MathematicsLogistic ModelsData Interpretation StatisticalGeneticsCluster AnalysisHumansData miningCluster analysisMolecular BiologyUnit-weighted regressioncomputerGenome-Wide Association StudyStatistical Applications in Genetics and Molecular Biology

researchProduct

Multiple testing in candidate gene situations: a comparison of classical, discrete, and resampling-based procedures.

2011

In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account for the multiplicity of the test situation, different types of multiple testing procedures have been proposed. The question arises whether procedures taking into account the discreteness of the situation show a benefit especially in case of correlated data. We empirically evaluate several different multiple testing procedures via simulation studies using simulated correlated SN…

Statistics and ProbabilityCandidate geneContrast (statistics)computer.software_genrePolymorphism Single NucleotideSet (abstract data type)Computational MathematicsSample size determinationResamplingData Interpretation StatisticalSample SizeStatisticsMultiple comparisons problemGeneticsCochran–Armitage test for trendRange (statistics)HumansComputer SimulationDiseaseData miningMolecular BiologycomputerGenetic Association StudiesMathematicsStatistical applications in genetics and molecular biology

researchProduct

Opportunities and challenges of combined effect measures based on prioritized outcomes

2013

Many authors have proposed different approaches to combine multiple endpoints in a univariate outcome measure in the literature. In case of binary or time-to-event variables, composite endpoints, which combine several event types within a single event or time-to-first-event analysis are often used to assess the overall treatment effect. A main drawback of this approach is that the interpretation of the composite effect can be difficult as a negative effect in one component can be masked by a positive effect in another. Recently, some authors proposed more general approaches based on a priority ranking of outcomes, which moreover allow to combine outcome variables of different scale levels. …

Statistics and ProbabilityClinical Trials as TopicEpidemiologyUnivariatecomputer.software_genreOutcome (game theory)Treatment OutcomeRankingScale (social sciences)Component (UML)Outcome Assessment Health CareMultiple comparisons problemHumansComputer SimulationData miningcomputerProportional Hazards ModelsMathematicsStatistical hypothesis testingEvent (probability theory)Statistics in Medicine

researchProduct

An interest rates cluster analysis

2004

An empirical analysis of interest rates in money and capital markets is performed. We investigate a set of 34 different weekly interest rate time series during a time period of 16 years between 1982 and 1997. Our study is focused on the collective behavior of the stochastic fluctuations of these time-series which is investigated by using a clustering linkage procedure. Without any a priori assumption, we individuate a meaningful separation in 6 main clusters organized in a hierarchical structure.

Statistics and ProbabilityCollective behaviormedia_common.quotation_subjectFOS: Physical sciencesLinkage (mechanical)computer.software_genrelaw.inventionFOS: Economics and businesslawEconometricsCluster (physics)Cluster analysisCondensed Matter - Statistical Mechanicsmedia_commonStatistical Finance (q-fin.ST)Statistical Mechanics (cond-mat.stat-mech)EconophysicsSeries (mathematics)Quantitative Finance - Statistical FinanceCondensed Matter PhysicsInterest rateCondensed Matter - Other Condensed MatterData miningCapital marketcomputerOther Condensed Matter (cond-mat.other)

researchProduct

An overview of robust Bayesian analysis

1994

Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis. © 1994 SEIO.

Statistics and ProbabilityComputer scienceBayesian probabilitycomputer.software_genreData scienceField (computer science)Bayesian robustnessN/ARobust Bayesian analysisPrior probabilityData miningSensitivity (control systems)Statistics Probability and Uncertaintycomputer

researchProduct

A new mathematical approach for the estimation of the AUC and its variability under different experimental designs in preclinical studies

2011

The aim of the present work was to develop a new mathematical method for estimating the area under the curve (AUC) and its variability that could be applied in different preclinical experimental designs and amenable to be implemented in standard calculation worksheets. In order to assess the usefulness of the new approach, different experimental scenarios were studied and the results were compared with those obtained with commonly used software: WinNonlin® and Phoenix WinNonlin®. The results do not show statistical differences among the AUC values obtained by both procedures, but the new method appears to be a better estimator of the AUC standard error, measured as the coverage of 95% confi…

Statistics and ProbabilityComputer scienceDrug Evaluation PreclinicalAdministration Oralcomputer.software_genreSoftwareCiprofloxacinArea under curveVariance estimationAnimalsPharmacology (medical)Rats WistarPharmacologyModels Statisticalbusiness.industryDesign of experimentsEstimatorModels TheoreticalConfidence intervalRatsStandard errorResearch DesignArea Under CurveData miningbusinesscomputerSoftwarePharmaceutical Statistics

researchProduct

Pathway analysis of high-throughput biological data within a Bayesian network framework

2011

Abstract Motivation: Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Results: Proposed method takes into account the connectivity and relatedness between nodes of the p…

Statistics and ProbabilityComputer scienceHigh-throughput screeningGene regulatory networkcomputer.software_genreModels BiologicalBiochemistrySynthetic dataBiological pathwayBayes' theoremHumansGene Regulatory NetworksCarcinoma Renal CellMolecular BiologyGeneBiological dataMicroarray analysis techniquesGene Expression ProfilingBayesian networkRobustness (evolution)Bayes TheoremPathway analysisKidney NeoplasmsHigh-Throughput Screening AssaysComputer Science ApplicationsGene expression profilingComputational MathematicsComputational Theory and MathematicsCausal inferenceData miningcomputerAlgorithmsSoftwareBioinformatics

researchProduct

Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods

2014

Abstract Motivation: Protein–protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. Results: We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and …

Statistics and ProbabilityComputer sciencePopulationPopulation basedMachine learningcomputer.software_genreBiochemistryProtein protein interaction networkgenetic algorithmsProtein–protein interactionBioinformatics Clustering Biological NetworksPPI networkscomplex detectionProtein Interaction MappingAnimalsCluster AnalysisHumanseducationCluster analysisMolecular BiologyTopology (chemistry)Class (computer programming)education.field_of_studybusiness.industryfood and beveragesProteinsComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsArtificial intelligenceData miningbusinessFocus (optics)computerAlgorithms

researchProduct

Anthropometry: An R Package for Analysis of Anthropometric Data

2017

The development of powerful new 3D scanning techniques has enabled the generation of large up-to-date anthropometric databases which provide highly valued data to improve the ergonomic design of products adapted to the user population. As a consequence, Ergonomics and Anthropometry are two increasingly quantitative fields, so advanced statistical methodologies and modern software tools are required to get the maximum benefit from anthropometric data. This paper presents a new R package, called Anthropometry, which is available on the Comprehensive R Archive Network. It brings together some statistical methodologies concerning clustering, statistical shape analysis, statistical archetypal an…

Statistics and ProbabilityComputer sciencePopulationstatistical shape analysis02 engineering and technologycomputer.software_genre01 natural sciences010104 statistics & probabilitySoftware0202 electrical engineering electronic engineering information engineeringR; anthropometric data; clustering; statistical shape analysis; archetypal analysis; data depth0101 mathematicsarchetypal analysisCluster analysiseducationlcsh:Statisticslcsh:HA1-4737education.field_of_studyAnthropometric databusiness.industryStatistical shape analysisRHuman factors and ergonomicsAnthropometryanthropometric dataVignette020201 artificial intelligence & image processingData miningStatistics Probability and Uncertaintydata depthbusinesscomputerSoftwareclusteringJournal of Statistical Software

researchProduct