Search results for "Computer Science"
showing 10 items of 22367 documents
FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications
2017
Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…
Detecting mutations by eBWT
2018
In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.
2020
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…
Gating Harmonization Guidelines for Intracellular Cytokine Staining Validated in Second International Multiconsortia Proficiency Panel Conducted by C…
2020
Results from the first gating proficiency panel of intracellular cytokine staining (ICS) highlighted the value of using a consensus gating approach to reduce the variability across laboratories in reported %CD8+ or %CD4+ cytokine-positive cells. Based on the data analysis from the first proficiency panel, harmonization guidelines for a consensus gating protocol were proposed. To validate the recommendations from the first panel and to examine factors that were not included in the first panel, a second ICS gating proficiency panel was organized. All participants analyzed the same set of Flow Cytometry Standard (FCS) files using their own gating protocol. An optional learning module was provi…
Enhancement in Phospholipase D Activity as a New Proposed Molecular Mechanism of Haloperidol-Induced Neurotoxicity
2020
Membrane phospholipase D (PLD) is associated with numerous neuronal functions, such as axonal growth, synaptogenesis, formation of secretory vesicles, neurodegeneration, and apoptosis. PLD acts mainly on phosphatidylcholine, from which phosphatidic acid (PA) and choline are formed. In turn, PA is a key element of the PLD-dependent secondary messenger system. Changes in PLD activity are associated with the mechanism of action of olanzapine, an atypical antipsychotic. The aim of the present study was to assess the effect of short-term administration of the first-generation antipsychotic drugs haloperidol, chlorpromazine, and fluphenazine on membrane PLD activity in the rat brain. Animals were…
Feasibility of sample size calculation for RNA-seq studies
2017
Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…
Old meets new: Comparative examination of conventional and innovative RNA-based methods for body fluid identification of laundered seminal fluid stai…
2018
Abstract The knowledge about the type of the body fluid/tissue that contributed to a trace can provide contextual insight into crime scene reconstruction and connect a suspect or a victim to a crime scene. Especially in sexual assault cases, it is important to verify the presence of spermatozoa. Victims often tend to clean their underwear/bedding after a sexual assault. If they later decide to report the crime to the police, in our experience, investigators usually do not send laundered items for DNA examination, since they believe that analysis after washing is no longer promising. As not only the individualization of traces on laundered items could be important in court, but also the type…
Deciphering the functional role of spatial and temporal muscle synergies in whole-body movements
2018
AbstractVoluntary movement is hypothesized to rely on a limited number of muscle synergies, the recruitment of which translates task goals into effective muscle activity. In this study, we investigated how to analytically characterize the functional role of different types of muscle synergies in task performance. To this end, we recorded a comprehensive dataset of muscle activity during a variety of whole-body pointing movements. We decomposed the electromyographic (EMG) signals using a space-by-time modularity model which encompasses the main types of synergies. We then used a task decoding and information theoretic analysis to probe the role of each synergy by mapping it to specific task …