Search results for " Probability"
showing 10 items of 2176 documents
Reducing sample size in experiments with animals: historical controls and related strategies
2015
Reducing the number of animal subjects used in biomedical experiments is desirable for ethical and practical reasons. Previous reviews of the benefits of reducing sample sizes have focused on improving experimental designs and methods of statistical analysis, but reducing the size of control groups has been considered rarely. We discuss how the number of current control animals can be reduced, without loss of statistical power, by incorporating information from historical controls, i.e. subjects used as controls in similar previous experiments. Using example data from published reports, we describe how to incorporate information from historical controls under a range of assumptions that mig…
Retract p < 0.005 and propose using JASP, instead
2018
Seeking to address the lack of research reproducibility in science, including psychology and the life sciences, a pragmatic solution has been raised recently: to use a stricter p < 0.005 standard for statistical significance when claiming evidence of new discoveries. Notwithstanding its potential impact, the proposal has motivated a large mass of authors to dispute it from different philosophical and methodological angles. This article reflects on the original argument and the consequent counterarguments, and concludes with a simpler and better-suited alternative that the authors of the proposal knew about and, perhaps, should have made from their Jeffresian perspective: to use a Bayes …
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
2018
Genome-Wide-Association-Studies have become a powerful method to link point mutations (e.g. single nucleotide polymorphisms (SNPs)) to a certain phenotype or a disease. However, their power to detect SNPs associated to polygenic diseases such as Alzheimer's Disease (AD) is limited, since they can only infer the pairwise relation of single SNPs to the phenotype and ignore possible effects of various SNP combinations. The common method to probe these possible complex genetic patterns is to compute a measure called linkage disequilibrium (LD). Despite the fact that several predictive patterns found with LD could successfully be applied to medical diagnosis, this measure still holds several dra…
A Dirichlet Autoregressive Model for the Analysis of Microbiota Time-Series Data
2021
Growing interest in understanding microbiota dynamics has motivated the development of different strategies to model microbiota time series data. However, all of them must tackle the fact that the available data are high-dimensional, posing strong statistical and computational challenges. In order to address this challenge, we propose a Dirichlet autoregressive model with time-varying parameters, which can be directly adapted to explain the effect of groups of taxa, thus reducing the number of parameters estimated by maximum likelihood. A strategy has been implemented which speeds up this estimation. The usefulness of the proposed model is illustrated by application to a case study.
Genome-scale analysis of evolutionary rate and selection in a fast-expanding Spanish cluster of HIV-1 subtype F1.
2018
Abstract This work is aimed at assessing the presence of positive selection and/or shifts of the evolutionary rate in a fast-expanding HIV-1 subtype F1 transmission cluster affecting men who have sex with men in Spain. We applied Bayesian coalescent phylogenetics and selection analyses to 23 full-coding region sequences from patients belonging to that cluster, along with other 19 F1 epidemiologically-unrelated sequences. A shift in the overall evolutionary rate of the virus, explained by positively selected sites in the cluster, was detected. We also found one substitution in Nef (H89F) that was specific to the cluster and experienced positive selection. These results suggest that fast tran…
Toward a direct and scalable identification of reduced models for categorical processes.
2017
The applicability of many computational approaches is dwelling on the identification of reduced models defined on a small set of collective variables (colvars). A methodology for scalable probability-preserving identification of reduced models and colvars directly from the data is derived—not relying on the availability of the full relation matrices at any stage of the resulting algorithm, allowing for a robust quantification of reduced model uncertainty and allowing us to impose a priori available physical information. We show two applications of the methodology: (i) to obtain a reduced dynamical model for a polypeptide dynamics in water and (ii) to identify diagnostic rules from a standar…
Identifying Prognostic SNPs in Clinical Cohorts: Complementing Univariate Analyses by Resampling and Multivariable Modeling
2016
Clinical cohorts with time-to-event endpoints are increasingly characterized by measurements of a number of single nucleotide polymorphisms that is by a magnitude larger than the number of measurements typically considered at the gene level. At the same time, the size of clinical cohorts often is still limited, calling for novel analysis strategies for identifying potentially prognostic SNPs that can help to better characterize disease processes. We propose such a strategy, drawing on univariate testing ideas from epidemiological case-controls studies on the one hand, and multivariable regression techniques as developed for gene expression data on the other hand. In particular, we focus on …
Melanoma-Nevus Discrimination Based on Image Statistics in Few Spectral Channels
2016
The purpose of this paper is to offer a method for discrimination of cutaneous melanoma from benign nevus, founded on analysis of skin lesion image. At the core of method is calculation of mean and standard deviation of pixel optical density values for a few narrow spectral bands. Calculated values are compared with discriminating thresholds derived from a set of images of benign nevi and melanomas with known diagnosis. Classification is done applying weighted majority rule to results of thresholding. Verification against the available multispectral images of 32 melanomas and 94 benign nevi has shown that the method using three spectral bands provided zero false negative and four false posi…