Search results for "Computer Science"

showing 10 items of 22367 documents

FASTdoop: A versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications

2017

Abstract Summary MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files. We show that, with respect to analogous input management routines that have appeared in the Literature, it offers…

0301 basic medicineFASTQ formatStatistics and ProbabilityComputer scienceSequence analysismedia_common.quotation_subjectInformation Storage and RetrievalBioinformaticscomputer.software_genreGenomeBiochemistryDomain (software engineering)03 medical and health sciencesComputational Theory and MathematicHumansGenomic libraryQuality (business)DNA sequencingFASTQ; NGS; FASTQ; DNA sequencingMolecular Biologymedia_commonGene LibrarySequenceDatabaseSettore INF/01 - InformaticaGenome HumanComputer Science Applications1707 Computer Vision and Pattern RecognitionGenomicsSequence Analysis DNAFASTQFile formatComputer Science ApplicationsStatistics and Probability; Biochemistry; Molecular Biology; Computer Science Applications1707 Computer Vision and Pattern Recognition; Computational Theory and Mathematics; Computational MathematicsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsNGSDatabase Management Systemscomputer

researchProduct

Detecting mutations by eBWT

2018

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (eBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the eBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in th…

0301 basic medicineFOS: Computer and information sciences000 Computer science knowledge general worksBWT LCP Array SNPs Reference-free Assembly-freeLCP ArraySettore INF/01 - Informatica[SDV]Life Sciences [q-bio]Reference-freeAssembly-freeSNP03 medical and health sciences030104 developmental biologyBWTBWT; LCP Array; SNPs; Reference-free; Assembly-freeComputer ScienceComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO]Computer Science [cs]SoftwareSNPs

researchProduct

The colored longest common prefix array computed via sequential scans

2018

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…

0301 basic medicineFOS: Computer and information sciencesAlignment-free methodsBurrows–Wheeler transformComputer scienceComputationAverage common substring0206 medical engineeringMatching statisticsScale (descriptive set theory)02 engineering and technologyTheoretical Computer Science03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Burrows-wheeler transformString (computer science)Computer Science (all)LCP arrayMatching statisticData structureSubstring030104 developmental biologyAlignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all)Pairwise comparisonLongest common prefixAlgorithm020602 bioinformaticsAlignment-free method

researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)

researchProduct

Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.

2020

Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…

0301 basic medicineFalse discovery rateComputer scienceArtificial Gene Amplification and ExtensionPolymerase Chain ReactionDatabase and Informatics MethodsSequencing techniques0302 clinical medicineBreast TumorsBasic Cancer ResearchMedicine and Health SciencesDNA sequencingBiology (General)EcologyHigh-Throughput Nucleotide SequencingGenomicsDNA Neoplasm3. Good healthIdentification (information)OncologyComputational Theory and MathematicsModeling and SimulationMCF-7 CellsFemaleSequence AnalysisResearch ArticleBioinformaticsQH301-705.5Breast NeoplasmsGenomicsComputational biologyResearch and Analysis MethodsHuman Genomics03 medical and health sciencesCellular and Molecular NeuroscienceCancer GenomicsGenomic MedicineBreast CancerGeneticsDNA Barcoding TaxonomicHumansMolecular Biology TechniquesMolecular BiologyEcology Evolution Behavior and SystematicsWhole genome sequencingLinkage (software)Whole Genome SequencingGenome HumanDideoxy DNA sequencingGenetic Diseases InbornCancers and NeoplasmsBiology and Life SciencesComputational BiologyStatistical modelSequence Analysis DNARepetitive RegionsLogistic Models030104 developmental biologyGenomic Structural VariationHuman genomeSequence Alignment030217 neurology & neurosurgeryPLoS Computational Biology

researchProduct

Gating Harmonization Guidelines for Intracellular Cytokine Staining Validated in Second International Multiconsortia Proficiency Panel Conducted by C…

2020

Results from the first gating proficiency panel of intracellular cytokine staining (ICS) highlighted the value of using a consensus gating approach to reduce the variability across laboratories in reported %CD8+ or %CD4+ cytokine-positive cells. Based on the data analysis from the first proficiency panel, harmonization guidelines for a consensus gating protocol were proposed. To validate the recommendations from the first panel and to examine factors that were not included in the first panel, a second ICS gating proficiency panel was organized. All participants analyzed the same set of Flow Cytometry Standard (FCS) files using their own gating protocol. An optional learning module was provi…

0301 basic medicineFlow Cytometry StandardProtocol (science)medicine.medical_specialtyIntracellular cytokine stainingHistologyStaining and LabelingComputer scienceReproducibility of ResultsHarmonizationCell BiologyGatingFlow CytometryPathology and Forensic Medicine03 medical and health sciences030104 developmental biology0302 clinical medicineNeoplasms030220 oncology & carcinogenesismedicineCytokinesHumansMedical physicsImmunotherapyCytometry Part A

researchProduct

Enhancement in Phospholipase D Activity as a New Proposed Molecular Mechanism of Haloperidol-Induced Neurotoxicity

2020

Membrane phospholipase D (PLD) is associated with numerous neuronal functions, such as axonal growth, synaptogenesis, formation of secretory vesicles, neurodegeneration, and apoptosis. PLD acts mainly on phosphatidylcholine, from which phosphatidic acid (PA) and choline are formed. In turn, PA is a key element of the PLD-dependent secondary messenger system. Changes in PLD activity are associated with the mechanism of action of olanzapine, an atypical antipsychotic. The aim of the present study was to assess the effect of short-term administration of the first-generation antipsychotic drugs haloperidol, chlorpromazine, and fluphenazine on membrane PLD activity in the rat brain. Animals were…

0301 basic medicineFluphenazineolanzapinePhospholipasePharmacologyCatalysishaloperidollcsh:ChemistryInorganic Chemistry03 medical and health scienceschemistry.chemical_compound0302 clinical medicineneurotoxicityHaloperidolmedicineAnimalsphospholipase DPhospholipase D activityPhysical and Theoretical ChemistryChlorpromazinechlorpromazinelcsh:QH301-705.5Molecular BiologySpectroscopy030102 biochemistry & molecular biologyPhospholipase DCommunicationOrganic ChemistryGeneral MedicinePhosphatidic acidfluphenazineRatsComputer Science ApplicationsEnzyme Activationenzymes and coenzymes (carbohydrates)lcsh:Biology (General)lcsh:QD1-999chemistryMechanism of actionneuroprotectionlipids (amino acids peptides and proteins)medicine.symptom030217 neurology & neurosurgerymedicine.drugInternational Journal of Molecular Sciences

researchProduct

Feasibility of sample size calculation for RNA-seq studies

2017

Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…

0301 basic medicineFold (higher-order function)Sequence Analysis RNAComputer scienceHigh-Throughput Nucleotide SequencingRNA-Seqcomputer.software_genre03 medical and health sciences030104 developmental biology0302 clinical medicineResearch DesignSample size determinationSample SizeFeasibility StudiesHumansData miningMolecular BiologycomputerSoftware030217 neurology & neurosurgeryInformation SystemsSystematic searchBriefings in Bioinformatics

researchProduct

Old meets new: Comparative examination of conventional and innovative RNA-based methods for body fluid identification of laundered seminal fluid stai…

2018

Abstract The knowledge about the type of the body fluid/tissue that contributed to a trace can provide contextual insight into crime scene reconstruction and connect a suspect or a victim to a crime scene. Especially in sexual assault cases, it is important to verify the presence of spermatozoa. Victims often tend to clean their underwear/bedding after a sexual assault. If they later decide to report the crime to the police, in our experience, investigators usually do not send laundered items for DNA examination, since they believe that analysis after washing is no longer promising. As not only the individualization of traces on laundered items could be important in court, but also the type…

0301 basic medicineForensic GeneticsMaleComputer scienceSemenStainPolymerase Chain ReactionFluorescencePathology and Forensic Medicine03 medical and health scienceschemistry.chemical_compound0302 clinical medicineSemenBiological propertyGeneticsCrime sceneHumans030216 legal & forensic medicineRNA MessengerFluorescent DyesLaunderingBody fluidbusiness.industryTextilesRNAPattern recognitionDNADNA FingerprintingSpermatozoaIdentification (information)MicroRNAs030104 developmental biologychemistryArtificial intelligencebusinessDNAMicrosatellite RepeatsForensic science international. Genetics

researchProduct

Deciphering the functional role of spatial and temporal muscle synergies in whole-body movements

2018

AbstractVoluntary movement is hypothesized to rely on a limited number of muscle synergies, the recruitment of which translates task goals into effective muscle activity. In this study, we investigated how to analytically characterize the functional role of different types of muscle synergies in task performance. To this end, we recorded a comprehensive dataset of muscle activity during a variety of whole-body pointing movements. We decomposed the electromyographic (EMG) signals using a space-by-time modularity model which encompasses the main types of synergies. We then used a task decoding and information theoretic analysis to probe the role of each synergy by mapping it to specific task …

0301 basic medicineFunctional roleAdultMalespinal-cordComputer scienceMovementequilibrium-point hypothesislcsh:Medicineemg patternsarm movementsTemporal muscleArticleinterindividual variabilityprimitives03 medical and health sciences0302 clinical medicineSpatio-Temporal Analysismedicinemotor controlHumansMuscle activityMuscle Skeletalactivation patternslcsh:ScienceMultidisciplinarybusiness.industryElectromyographylcsh:RMotor controlPattern recognitionSpinal cord030104 developmental biologymedicine.anatomical_structureFemale[SDV.NEU]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]lcsh:QArtificial intelligenceWhole bodybusinesssensorimotor control030217 neurology & neurosurgeryinformation measuresScientific Reports

researchProduct