Search results for "Information"

showing 10 items of 14916 documents

FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

2017

Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…

0301 basic medicineFASTQ formatStatistics and ProbabilityComputer scienceSequence analysismedia_common.quotation_subjectInformation Storage and RetrievalBioinformaticscomputer.software_genreGenomeBiochemistryDomain (software engineering)03 medical and health sciencesComputational Theory and MathematicHumansGenomic libraryQuality (business)DNA sequencingFASTQ; NGS; FASTQ; DNA sequencingMolecular Biologymedia_commonGene LibrarySequenceDatabaseSettore INF/01 - InformaticaGenome HumanComputer Science Applications1707 Computer Vision and Pattern RecognitionGenomicsSequence Analysis DNAFASTQFile formatComputer Science ApplicationsStatistics and Probability; Biochemistry; Molecular Biology; Computer Science Applications1707 Computer Vision and Pattern Recognition; Computational Theory and Mathematics; Computational MathematicsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsNGSDatabase Management Systemscomputer
researchProduct

Detecting mutations by eBWT

2018

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…

0301 basic medicineFOS: Computer and information sciences000 Computer science knowledge general worksBWT LCP Array SNPs Reference-free Assembly-freeLCP ArraySettore INF/01 - Informatica[SDV]Life Sciences [q-bio]Reference-freeAssembly-freeSNP03 medical and health sciences030104 developmental biologyBWTBWT; LCP Array; SNPs; Reference-free; Assembly-freeComputer ScienceComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]SoftwareSNPs
researchProduct

The colored longest common prefix array computed via sequential scans

2018

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…

0301 basic medicineFOS: Computer and information sciencesAlignment-free methodsBurrows–Wheeler transformComputer scienceComputationAverage common substring0206 medical engineeringMatching statisticsScale (descriptive set theory)02 engineering and technologyTheoretical Computer Science03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Burrows-wheeler transformString (computer science)Computer Science (all)LCP arrayMatching statisticData structureSubstring030104 developmental biologyAlignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all)Pairwise comparisonLongest common prefixAlgorithm020602 bioinformaticsAlignment-free method
researchProduct

Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

2016

Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …

0301 basic medicineFOS: Computer and information sciencesDuplication ratesChromatin ImmunoprecipitationBioinformaticsPipeline (computing)610Biologycomputer.software_genre600 Technik Medizin angewandte Wissenschaften::610 Medizin und Gesundheit03 medical and health sciencesSoftwareChIP-nexusGeneticsPreprocessorNucleotide MotifsLibrary complexityChIP-exoGeneticsProtocol (science)Binding Sitesbusiness.industryfungiComputational BiologyHigh-Throughput Nucleotide SequencingReproducibility of ResultsChipChromatin immunoprecipitationData mappingDNA-Binding ProteinsAlgorithm030104 developmental biologyChIP-exoData miningbusinessPeak callingcomputerAlgorithmsSoftwareProtein BindingTranscription FactorsResearch ArticleBiotechnologyBMC Genomics
researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.

2020

Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…

0301 basic medicineFalse discovery rateComputer scienceArtificial Gene Amplification and ExtensionPolymerase Chain ReactionDatabase and Informatics MethodsSequencing techniques0302 clinical medicineBreast TumorsBasic Cancer ResearchMedicine and Health SciencesDNA sequencingBiology (General)EcologyHigh-Throughput Nucleotide SequencingGenomicsDNA Neoplasm3. Good healthIdentification (information)OncologyComputational Theory and MathematicsModeling and SimulationMCF-7 CellsFemaleSequence AnalysisResearch ArticleBioinformaticsQH301-705.5Breast NeoplasmsGenomicsComputational biologyResearch and Analysis MethodsHuman Genomics03 medical and health sciencesCellular and Molecular NeuroscienceCancer GenomicsGenomic MedicineBreast CancerGeneticsDNA Barcoding TaxonomicHumansMolecular Biology TechniquesMolecular BiologyEcology Evolution Behavior and SystematicsWhole genome sequencingLinkage (software)Whole Genome SequencingGenome HumanDideoxy DNA sequencingGenetic Diseases InbornCancers and NeoplasmsBiology and Life SciencesComputational BiologyStatistical modelSequence Analysis DNARepetitive RegionsLogistic Models030104 developmental biologyGenomic Structural VariationHuman genomeSequence Alignment030217 neurology & neurosurgeryPLoS Computational Biology
researchProduct

Feasibility of sample size calculation for RNA-seq studies

2017

Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…

0301 basic medicineFold (higher-order function)Sequence Analysis RNAComputer scienceHigh-Throughput Nucleotide SequencingRNA-Seqcomputer.software_genre03 medical and health sciences030104 developmental biology0302 clinical medicineResearch DesignSample size determinationSample SizeFeasibility StudiesHumansData miningMolecular BiologycomputerSoftware030217 neurology & neurosurgeryInformation SystemsSystematic searchBriefings in Bioinformatics
researchProduct

Old meets new: Comparative examination of conventional and innovative RNA-based methods for body fluid identification of laundered seminal fluid stai…

2018

Abstract The knowledge about the type of the body fluid/tissue that contributed to a trace can provide contextual insight into crime scene reconstruction and connect a suspect or a victim to a crime scene. Especially in sexual assault cases, it is important to verify the presence of spermatozoa. Victims often tend to clean their underwear/bedding after a sexual assault. If they later decide to report the crime to the police, in our experience, investigators usually do not send laundered items for DNA examination, since they believe that analysis after washing is no longer promising. As not only the individualization of traces on laundered items could be important in court, but also the type…

0301 basic medicineForensic GeneticsMaleComputer scienceSemenStainPolymerase Chain ReactionFluorescencePathology and Forensic Medicine03 medical and health scienceschemistry.chemical_compound0302 clinical medicineSemenBiological propertyGeneticsCrime sceneHumans030216 legal & forensic medicineRNA MessengerFluorescent DyesLaunderingBody fluidbusiness.industryTextilesRNAPattern recognitionDNADNA FingerprintingSpermatozoaIdentification (information)MicroRNAs030104 developmental biologychemistryArtificial intelligencebusinessDNAMicrosatellite RepeatsForensic science international. Genetics
researchProduct

Deciphering the functional role of spatial and temporal muscle synergies in whole-body movements

2018

AbstractVoluntary movement is hypothesized to rely on a limited number of muscle synergies, the recruitment of which translates task goals into effective muscle activity. In this study, we investigated how to analytically characterize the functional role of different types of muscle synergies in task performance. To this end, we recorded a comprehensive dataset of muscle activity during a variety of whole-body pointing movements. We decomposed the electromyographic (EMG) signals using a space-by-time modularity model which encompasses the main types of synergies. We then used a task decoding and information theoretic analysis to probe the role of each synergy by mapping it to specific task …

0301 basic medicineFunctional roleAdultMalespinal-cordComputer scienceMovementequilibrium-point hypothesislcsh:Medicineemg patternsarm movementsTemporal muscleArticleinterindividual variabilityprimitives03 medical and health sciences0302 clinical medicineSpatio-Temporal Analysismedicinemotor controlHumansMuscle activityMuscle Skeletalactivation patternslcsh:ScienceMultidisciplinarybusiness.industryElectromyographylcsh:RMotor controlPattern recognitionSpinal cord030104 developmental biologymedicine.anatomical_structureFemale[SDV.NEU]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]lcsh:QArtificial intelligenceWhole bodybusinesssensorimotor control030217 neurology & neurosurgeryinformation measuresScientific Reports
researchProduct

Use of deep learning methods to translate drug-induced gene expression changes from rat to human primary hepatocytes

2020

In clinical trials, animal and cell line models are often used to evaluate the potential toxic effects of a novel compound or candidate drug before progressing to human trials. However, relating the results of animal and in vitro model exposures to relevant clinical outcomes in the human in vivo system still proves challenging, relying on often putative orthologs. In recent years, multiple studies have demonstrated that the repeated dose rodent bioassay, the current gold standard in the field, lacks sufficient sensitivity and specificity in predicting toxic effects of pharmaceuticals in humans. In this study, we evaluate the potential of deep learning techniques to translate the pattern of …

0301 basic medicineGene ExpressionGene Expression Regulation/drug effectsPathology and Laboratory MedicineConvolutional neural networkTOXICITYMachine LearningVoeding Metabolisme en GenomicaTime Measurement0302 clinical medicineGene expressionMedicine and Health SciencesMeasurementClinical Trials as TopicMultidisciplinaryArtificial neural networkPharmaceuticsQRMetabolism and GenomicsTOXICOGENOMICS030220 oncology & carcinogenesisMetabolisme en GenomicaMedicineEngineering and TechnologyNutrition Metabolism and GenomicsHepatocytes/drug effectsAlgorithmsResearch ArticleComputer and Information SciencesClinical Trials as Topic/statistics & numerical dataNeural NetworksGenetic ToxicologyTOXICOLOGYSciencePredictive ToxicologyComputational biologyBiologyComputer03 medical and health sciencesDose Prediction MethodsDeep LearningVoedingArtificial IntelligenceIn vivoGeneticsLife ScienceAnimalsHumansGeneNutritionbusiness.industryDeep learningBiology and Life SciencesGold standard (test)REPRESENTATIONSRats030104 developmental biologyGene Expression RegulationHepatocytesArtificial intelligenceNeural Networks ComputerToxicogenomicsbusinessNeuroscience
researchProduct