Search results for "sequencing data"
showing 6 items of 16 documents
No evidence of EMAST in whole genome sequencing data from 248 colorectal cancers.
2021
Microsatellite instability (MSI) is caused by defective DNA mismatch repair (MMR), and manifests as accumulation of small insertions and deletions (indels) in short tandem repeats of the genome. Another form of repeat instability, elevated microsatellite alterations at selected tetranucleotide repeats (EMAST), has been suggested to occur in 50% to 60% of colorectal cancer (CRC), of which approximately one quarter are accounted for by MSI. Unlike for MSI, the criteria for defining EMAST is not consensual. EMAST CRCs have been suggested to form a distinct subset of CRCs that has been linked to a higher tumor stage, chronic inflammation, and poor prognosis. EMAST CRCs not exhibiting MSI have b…
Acceleration of short and long DNA read mapping without loss of accuracy using suffix array
2014
HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20 for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies.
piRNAclusterDB 2.0: update and expansion of the piRNA cluster database.
2021
Abstract PIWI-interacting RNAs (piRNAs) and their partnering PIWI proteins defend the animal germline against transposable elements and play a crucial role in fertility. Numerous studies in the past have uncovered many additional functions of the piRNA pathway, including gene regulation, anti-viral defense, and somatic transposon repression. Further, comparative analyses across phylogenetic groups showed that the PIWI/piRNA system evolves rapidly and exhibits great evolutionary plasticity. However, the presence of so-called piRNA clusters as the major source of piRNAs is common to nearly all metazoan species. These genomic piRNA-producing loci are highly divergent across taxa and critically…
Lightweight LCP construction for next-generation sequencing datasets
2012
The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and B…
Exploiting Glomus intraradices sequencing data to dissect molecular mechanisms of plant genome control over fungal gene expression in mycorrhiza
2006
International audience
SNPs detection by eBWT positional clustering
2019
Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. We develop the positional clustering theory that (i) describes how the extended Burrows–Wheeler Transform (eBWT) of a collection of reads tends to cluster together bases that cover the same genome position (ii) predicts the size of such clusters, and (iii) exhibits an elegant and precise LCP array based procedure to locate such clusters in the e…