Search results for "Datasets"
showing 10 items of 45 documents
Towards A Twitter Observatory: A Multi-Paradigm Framework For Collecting, Storing And Analysing Tweets
2016
International audience; In this article we show how a multi-paradigm framework can fulfil the requirements of tweets analysis and reduce the waiting time for researchers that use computational resources and storage systems to support large-scale data analysis. The originality of our approach is to combine concerns about data harvesting, data storage, data analysis and data visualisation into a framework that supports inductive reasoning in multidisciplinary scientific research. Our main contribution is a polyglot storage system with a generic data model to support logical data independence and a set of tools that can provide a suitable solution for mixing different types of algorithms in or…
Perspectives on the Impact of Sampling Design and Intensity on Soil Microbial Diversity Estimates
2019
Soil bacterial communities have long been recognized as important ecosystem components, and have been the focus of many local and regional studies. However, there is a lack of data at large spatial scales, on the biodiversity of soil microorganisms; national or more extensive studies to date have typically consisted of low replication of haphazardly collected samples. This has led to large spatial gaps in soil microbial biodiversity data. Using a pre-existing dataset of bacterial community composition across a 16-km regular sampling grid in France, we show that the number of detected OTUs changes little under different sampling designs (grid, random, or representative), but increases with t…
Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis
2012
AbstractThe advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of pre…
Assessment of the 4-factor score: Retrospective analysis of 586 CLL patients receiving ibrutinib. A campus CLL study
2021
Not Available
Fast Estimation of Diffusion Tensors under Rician noise by the EM algorithm
2016
Diffusion tensor imaging (DTI) is widely used to characterize, in vivo, the white matter of the central nerve system (CNS). This biological tissue contains much anatomic, structural and orientational information of fibers in human brain. Spectral data from the displacement distribution of water molecules located in the brain tissue are collected by a magnetic resonance scanner and acquired in the Fourier domain. After the Fourier inversion, the noise distribution is Gaussian in both real and imaginary parts and, as a consequence, the recorded magnitude data are corrupted by Rician noise. Statistical estimation of diffusion leads a non-linear regression problem. In this paper, we present a f…
Compendium of TCDD-mediated transcriptomic response datasets in mammalian model systems.
2017
Background 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) is the most potent congener of the dioxin class of environmental contaminants. Exposure to TCDD causes a wide range of toxic outcomes, ranging from chloracne to acute lethality. The severity of toxicity is highly dependent on the aryl hydrocarbon receptor (AHR). Binding of TCDD to the AHR leads to changes in transcription of numerous genes. Studies evaluating the transcriptional changes brought on by TCDD may provide valuable insight into the role of the AHR in human health and disease. We therefore compiled a collection of transcriptomic datasets that can be used to aid the scientific community in better understanding the transcriptiona…
Genome-wide associations for birth weight and correlations with adult disease
2016
Birth weight (BW) has been shown to be influenced by both fetal and maternal factors and in observational studies is reproducibly associated with future risk of adult metabolic diseases including type 2 diabetes (T2D) and cardiovascular disease. These life-course associations have often been attributed to the impact of an adverse early life environment. Here, we performed a multi-ancestry genome-wide association study (GWAS) meta-analysis of BW in 153,781 individuals, identifying 60 loci where fetal genotype was associated with BW (P < 5 × 10(-8)). Overall, approximately 15% of variance in BW was captured by assays of fetal genetic variation. Using genet…
Pathological significance and prognostic value of surfactant protein D in cancer
2018
Surfactant protein D (SP-D) is a pattern recognition molecule belonging to the Collectin (collagen-containing C-type lectin) family that has pulmonary as well as extra-pulmonary existence. In the lungs, it is a well-established opsonin that can agglutinate a range of microbes, and enhance their clearance via phagocytosis and super-oxidative burst. It can interfere with allergen–IgE interaction and suppress basophil and mast cell activation. However, it is now becoming evident that SP-D is likely to be an innate immune surveillance molecule against tumor development. SP-D has been shown to induce apoptosis in sensitized eosinophils derived from allergic patients and a leukemic cell line via …
Ventricular Fibrillation and Tachycardia detection from surface ECG using time-frequency representation images as input dataset for machine learning
2017
Parameter-less ventricular fibrillation detection with time-frequency representation.Time-frequency representations are treated as images for a classifier.A comparison for four classifiers demonstrates the validity of the proposed method.The proposed technique could be applied to any signal and research field.This is a novel approach to signal analysis. Background and objectiveTo safely select the proper therapy for Ventricullar Fibrillation (VF) is essential to distinct it correctly from Ventricular Tachycardia (VT) and other rhythms. Provided that the required therapy would not be the same, an erroneous detection might lead to serious injuries to the patient or even cause Ventricular Fibr…
Big Data in Medical Science–a Biostatistical View
2015
Big data” is a universal buzzword in business and science, referring to the retrieval and handling of ever-growing amounts of information. It can be assumed, for example, that a typical hospital generates hundreds of terabytes (1 TB = 1012 bytes) of data annually in the course of patient care (1). For instance, exome sequencing, which results in 5 gigabytes (1 GB = 109 bytes) of data per patient, is on the way to becoming routine (2). The analysis of such enormous volumes of information, i.e., organization and description of the data and the drawing of (scientifically valid) conclusions, can already hardly be accomplished with the traditional tools of computer science and statistics. For ex…