Search results for "soft"
showing 10 items of 9809 documents
Applications of Chemoinformatics in Predictive Toxicology for Regulatory Purposes, Especially in the Context of the EU REACH Legislation
2018
Chemoinformatics methodologies such as QSAR/QSPR have been used for decades in drug discovery projects, especially for the finding of new compounds with therapeutic properties and the optimization of ADME properties on chemical series. The application of computational techniques in predictive toxicology is much more recent, and they are experiencing an increasingly interest because of the new legal requirements imposed by national and international regulations. In the pharmaceutical field, the US Food and Drug Administration (FDA) support the use of predictive models for regulatory decision-making when assessing the genotoxic and carcinogenic potential of drug impurities. In Europe, the REA…
Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM
2019
Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell techno…
Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms
2018
Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…
Stopping injection attacks with code and structured data
2018
Injection attacks top the lists of the most harmful software vulnerabilities. Injection vulnerabilities are both commonplace and easy to exploit, which makes development of injection protection schemes important. In this article, we show how injection attacks can be practically eliminated through the use of structured data paired with cryptographic verification codes upon transmission. peerReviewed
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
Detecting mutations by eBWT
2018
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…
Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus
2016
Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.
2020
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…
Feasibility of sample size calculation for RNA-seq studies
2017
Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…
MiasDB: A Database of Molecular Interactions Associated with Alternative Splicing of Human Pre-mRNAs.
2016
Alternative splicing (AS) is pervasive in human multi-exon genes and is a major contributor to expansion of the transcriptome and proteome diversity. The accurate recognition of alternative splice sites is regulated by information contained in networks of protein-protein and protein-RNA interactions. However, the mechanisms leading to splice site selection are not fully understood. Although numerous databases have been built to describe AS, molecular interaction databases associated with AS have only recently emerged. In this study, we present a new database, MiasDB, that provides a description of molecular interactions associated with human AS events. This database covers 938 interactions …