Search results for "reference genome"

showing 10 items of 27 documents

Holistic Optimization of Bioinformatic Analysis Pipeline for Detection and Quantification of 2′-O-Methylations in RNA by RiboMethSeq

2020

International audience; A major trend in the epitranscriptomics field over the last 5 years has been the high-throughput analysis of RNA modifications by a combination of specific chemical treatment(s), followed by library preparation and deep sequencing. Multiple protocols have been described for several important RNA modifications, such as 5-methylcytosine (m5C), pseudouridine (ψ), 1-methyladenosine (m1A), and 2'-O-methylation (Nm). One commonly used method is the alkaline cleavage-based RiboMethSeq protocol, where positions of reads' 5'-ends are used to distinguish nucleotides protected by ribose methylation. This method was successfully applied to detect and quantify Nm residues in vari…

0301 basic medicinebioinformatic pipelinelcsh:QH426-470Computer scienceComputational biologyDeep sequencingPseudouridine03 medical and health scienceschemistry.chemical_compound0302 clinical medicine[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]ribose methylationEpitranscriptomicsGeneticsGenetics (clinical)receiver operating characteristic2'-O-methylation2′-O-methylationhigh-throughput sequencingRNA[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyBrief Research Reportlcsh:Genetics030104 developmental biologychemistry030220 oncology & carcinogenesisTransfer RNARNAMolecular MedicineSmall nuclear RNAReference genomeFrontiers in Genetics

researchProduct

Non-Redundant tRNA Reference Sequences for Deep Sequencing Analysis of tRNA Abundance and Epitranscriptomic RNA Modifications

2021

Analysis of RNA by deep-sequencing approaches has found widespread application in modern biology. In addition to measurements of RNA abundance under various physiological conditions, such techniques are now widely used for mapping and quantification of RNA modifications. Transfer RNA (tRNA) molecules are among the frequent targets of such investigation, since they contain multiple modified residues. However, the major challenge in tRNA examination is related to a large number of duplicated and point-mutated genes encoding those RNA molecules. Moreover, the existence of multiple isoacceptors/isodecoders complicates both the analysis and read mapping. Existing databases for tRNA sequencing pr…

0301 basic medicinelcsh:QH426-470ved/biology.organism_classification_rank.speciesComputational biologyBiology01 natural sciencesArticleDeep sequencingdeep sequencing03 medical and health sciencesRNA modificationsRNA Transferepitranscriptome[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Escherichia coliGeneticsModel organismtRNAGeneComputingMilieux_MISCELLANEOUSGenetics (clinical)Sequence Analysis RNA010405 organic chemistryved/biologyreference sequenceHigh-Throughput Nucleotide SequencingRNA[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyquantification0104 chemical scienceslcsh:GeneticsRNA Bacterial030104 developmental biologyTransfer RNADatabases Nucleic AcidtRNA poolBacillus subtilisReference genomeGenes

researchProduct

A hybrid short read mapping accelerator

2013

Background The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a compet…

:Engineering::Computer science and engineering [DRNTU]GenomeComputer sciencebusiness.industryApplied MathematicsMethodology ArticleChromosome MappingSequence Analysis DNABiochemistryComputer Science ApplicationsSoftwareComputer engineeringStructural BiologySensitivity (control systems)DNA microarraybusinessField-programmable gate arrayAlgorithmMolecular BiologySequence AlignmentDigital signal processingAlgorithmsSoftwareReference genomeBMC Bioinformatics

researchProduct

CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment

2017

Next generation sequencing (NGS) technologies have enabled cheap, large-scale, and high-throughput production of short DNA sequence reads and thereby have promoted the explosive growth of data volume. Unfortunately, the produced reads are short and prone to contain errors that are incurred during sequencing cycles. Both large data volume and sequencing errors have complicated the mapping of NGS reads onto the reference genome and have motivated the development of various aligners for very short reads, typically less than 100 base pairs (bps) in length. As read length continues to increase, propelled by advances in NGS technologies, these longer reads tend to have higher sequencing error rat…

CUDASoftware suiteComputer scienceSuiteVolume (computing)Human genomeParallel computingBioinformaticsGenomeDNA sequencingReference genome

researchProduct

Mycobacterium tuberculosiscomplex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type …

2020

AbstractPathogens of theMycobacterium tuberculosiscomplex (MTBC) are considered monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate the different MTBC lineages (L), especially L5 and L6 (traditionally termedMycobacterium africanum), from each other. However, genome variability and gene content especially of L5 and L6 strains have not been fully explored and may be potentially important for pathobiology and current approaches for genomic analysis of MTBC isolates, including transmission studies.We compared the genomes of 358 L5 clinical isolates (including 3 completed genomes and 355 Illumina WGS (whole genome seque…

Genetics0303 health sciencesLineage (genetic)030306 microbiologySequence assemblySingle-nucleotide polymorphismBiologybiology.organism_classificationGenome3. Good health03 medical and health sciencesMycobacterium tuberculosis complexGeneMycobacterium africanum030304 developmental biologyReference genome

researchProduct

Progress in Arabidopsis genome sequencing and functional genomics

2000

Arabidopsis thaliana has a relatively small genome of approximately 130 Mb containing about 10% repetitive DNA. Genome sequencing studies reveal a gene-rich genome, predicted to contain approximately 25 000 genes spaced on average every 4.5 kb. Between 10 to 20% of the predicted genes occur as clusters of related genes, indicating that local sequence duplication and subsequent divergence generates a significant proportion of gene families. In addition to gene families, repetitive sequences comprise individual and small clusters of two to three retroelements and other classes of smaller repeats. The clustering of highly repetitive elements is a striking feature of the A. thaliana genome emer…

GeneticsGenome evolutionDNA PlantArabidopsis thalianaArabidopsisAgricultureBioengineeringGenomicsSequence Analysis DNAGeneral MedicineGenome projectBiologyGenome sequencingApplied Microbiology and BiotechnologyGenomeGenesCot analysisPlant Research InternationalGene densityGenome sizeGenome PlantBiotechnologyReference genomeJournal of Biotechnology

researchProduct

MetaCache-GPU: Ultra-Fast Metagenomic Classification

2021

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$-mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growi…

Genomics (q-bio.GN)FOS: Computer and information sciencesSource codeComputer sciencemedia_common.quotation_subjectHash functionContext (language use)MinHashcomputer.software_genreData structureHash tableComputer Science - Distributed Parallel and Cluster ComputingFOS: Biological sciencesPreprocessorQuantitative Biology - GenomicsDistributed Parallel and Cluster Computing (cs.DC)Data miningcomputermedia_commonReference genome50th International Conference on Parallel Processing

researchProduct

Comparing DNA sequence collections by direct comparison of compressed text indexes

2012

Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have h…

Genomics (q-bio.GN)SequenceComputer sciencebusiness.industrySearch engine indexingSequence alignmentPattern recognitionConstruct (python library)Data structureBurrows-Wheeler Transform; Splice junctions; External memoryExternal memoryFOS: Biological sciencesCode (cryptography)Quantitative Biology - GenomicsBurrows-Wheeler TransformArtificial intelligencebusinessSplice junctionsAuxiliary memoryReference genome

researchProduct

Inferring heterozygosity from ancient and low coverage genomes

2016

Abstract While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage &lt;1× of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to …

Male0301 basic medicineHeterozygotePopulationGenomicsInvestigationsBiologyGenome03 medical and health sciences0302 clinical medicineGeneticsheterozygosityHumanslow coverageDNA AncienteducationPopulation and Evolutionary Geneticsancient DNA030304 developmental biologyGeneticsWhole genome sequencing0303 health scienceseducation.field_of_studyGenetic diversityBase SequenceGenome HumanGenetic Carrier ScreeningChromosome MappingGenetic VariationContrast (statistics)Coverage dataSequence Analysis DNApostmortem damageVariable (computer science)Genetics Population030104 developmental biologyAncient DNAEvolutionary biologybase recalibrationSoftware030217 neurology & neurosurgeryReference genome

researchProduct

Insights into archaeological human sample microbiome using 16S rRNA gene sequencing

2017

Human body is inhabited by a vast number of microorganisms, collectively known as human microbiome, and there is a tremendous interest in evolutionary changes of human microbial ecology, diversity and function. The field of paleomicrobiology – study of ancient human microbiome – is powered by modern techniques of Next Generation Sequencing (NGS), which allows extracting microbial genomic data directly from archaeological sample of interest. One of the major techniques is 16S rRNA gene sequencing, by which certain 16S rRNA gene hypervariable regions are being amplified and sequenced. However, some limitations of this method exist including taxonomic precision and efficacy of different region…

MetagenomicsmothurGenomicsIon semiconductor sequencingMicrobiomeBiologyArchaeologyDNA sequencingHypervariable regionReference genome2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

researchProduct