Search results for " data"
showing 10 items of 7516 documents
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Biophysics of high density nanometer regions extracted from super-resolution single particle trajectories: application to voltage-gated calcium chann…
2019
AbstractThe cellular membrane is very heterogenous and enriched with high-density regions forming microdomains, as revealed by single particle tracking experiments. However the organization of these regions remain unexplained. We determine here the biophysical properties of these regions, when described as a basin of attraction. We develop two methods to recover the dynamics and local potential wells (field of force and boundary). The first method is based on the local density of points distribution of trajectories, which differs inside and outside the wells. The second method focuses on recovering the drift field that is convergent inside wells and uses the transient field to determine the…
Use of deep learning methods to translate drug-induced gene expression changes from rat to human primary hepatocytes
2020
In clinical trials, animal and cell line models are often used to evaluate the potential toxic effects of a novel compound or candidate drug before progressing to human trials. However, relating the results of animal and in vitro model exposures to relevant clinical outcomes in the human in vivo system still proves challenging, relying on often putative orthologs. In recent years, multiple studies have demonstrated that the repeated dose rodent bioassay, the current gold standard in the field, lacks sufficient sensitivity and specificity in predicting toxic effects of pharmaceuticals in humans. In this study, we evaluate the potential of deep learning techniques to translate the pattern of …
MiasDB: A Database of Molecular Interactions Associated with Alternative Splicing of Human Pre-mRNAs.
2016
Alternative splicing (AS) is pervasive in human multi-exon genes and is a major contributor to expansion of the transcriptome and proteome diversity. The accurate recognition of alternative splice sites is regulated by information contained in networks of protein-protein and protein-RNA interactions. However, the mechanisms leading to splice site selection are not fully understood. Although numerous databases have been built to describe AS, molecular interaction databases associated with AS have only recently emerged. In this study, we present a new database, MiasDB, that provides a description of molecular interactions associated with human AS events. This database covers 938 interactions …
Diagnostic odyssey in severe neurodevelopmental disorders: toward clinical whole-exome sequencing as a first-line diagnostic test
2016
The current standard of care for diagnosis of severe intellectual disability (ID) and epileptic encephalopathy (EE) results in a diagnostic yield of ∼50%. Affected individuals nonetheless undergo multiple clinical evaluations and low-yield laboratory tests often referred to as a 'diagnostic odyssey'. This study was aimed at assessing the utility of clinical whole-exome sequencing (WES) in individuals with undiagnosed and severe forms of ID and EE, and the feasibility of its implementation in routine practice by a small regional genetic center. We performed WES in a cohort of 43 unrelated individuals with undiagnosed ID and/or EE. All individuals had undergone multiple clinical evaluations a…
Lost Strings in Genomes: What Sense Do They Make?
2017
We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.
Exploiting Helminth–Host Interactomes through Big Data
2017
Helminths facilitate their parasitic existence through the production and secretion of different molecules, including proteins. Some helminth proteins can manipulate the host's immune system, a phenomenon that is now being exploited with a view to developing therapeutics for inflammatory diseases. In recent years, hundreds of helminth genomes have been sequenced, but as a community we are still taking baby steps when it comes to identifying proteins that govern host-helminth interactions. The information generated from genomic, immunomic, and proteomic studies, as well as from cutting-edge approaches such as proteogenomics, is leading to a substantial volume of big data that can be utilised…
Phylogenomics of Lophotrochozoa with Consideration of Systematic Error.
2015
Phylogenomic studies have improved understanding of deep metazoan phylogeny and show promise for resolving incongruences among analyses based on limited numbers of loci. One region of the animal tree that has been especially difficult to resolve, even with phylogenomic approaches, is relationships within Lophotrochozoa (the animal clade that includes molluscs, annelids, and flatworms among others). Lack of resolution in phylogenomic analyses could be due to insufficient phylogenetic signal, limitations in taxon and/or gene sampling, or systematic error. Here, we investigated why lophotrochozoan phylogeny has been such a difficult question to answer by identifying and reducing sources of sys…
Exome-Wide Association Study on Alanine Aminotransferase Identifies Sequence Variants in the GPAM and APOE Associated With Fatty Liver Disease.
2021
BACKGROUND & AIMS: Fatty liver disease (FLD) is a growing epidemic that is expected to be the leading cause of end-stage liver disease within the next decade. Both environmental and genetic factors contribute to the susceptibility of FLD. Several genetic variants contributing to FLD have been identified in exome-wide association studies. However, there is still a missing hereditability indicating that other genetic variants are yet to be discovered. METHODS: To find genes involved in FLD, we first examined the association of missense and nonsense variants with alanine amino transferase at an exome-wide level in 425,671 participants from the UK Biobank. We then validated genetic variants wit…