Search results for " computing"

showing 10 items of 2075 documents

Next-generation sequencing: big data meets high performance computing

2017

The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and t…

0301 basic medicineComputer scienceDistributed computingGenomic researchBig dataTerabyteComputing MethodologiesDNA sequencing03 medical and health sciences0302 clinical medicineDatabases GeneticDrug DiscoveryHumansThroughput (business)PharmacologyGenomebusiness.industryHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAPrecision medicineSupercomputerData scienceCancer treatment030104 developmental biology030220 oncology & carcinogenesisbusinessAlgorithmsDrug Discovery Today
researchProduct

A new parallel pipeline for DNA methylation analysis of long reads datasets

2017

Background DNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. Results In this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while…

0301 basic medicineComputer scienceParallel pipelineADN02 engineering and technologycomputer.software_genreBiochemistrySensitivity and SpecificityDNA sequencingEpigenesis Genetic03 medical and health scienceschemistry.chemical_compoundStructural BiologyRNA analysisInformàticaDatabases Genetic0202 electrical engineering electronic engineering information engineeringHumansEpigeneticsMolecular Biology020203 distributed computingDNA methylationGenome HumanApplied MathematicsParallel pipelineMethylationSequence Analysis DNASupercomputerComputer Science ApplicationsGenòmica030104 developmental biologychemistryGene Expression RegulationDNA methylationMutationData miningHigh performance computingDNA microarraycomputerSequence AlignmentDNASoftware
researchProduct

2019

As rats learn to search for multiple sources of food or water in a complex environment, they generate increasingly efficient trajectories between reward sites. Such spatial navigation capacity involves the replay of hippocampal place-cells during awake states, generating small sequences of spatially related place-cell activity that we call "snippets". These snippets occur primarily during sharp-wave-ripples (SWRs). Here we focus on the role of such replay events, as the animal is learning a traveling salesperson task (TSP) across multiple trials. We hypothesize that snippet replay generates synthetic data that can substantially expand and restructure the experience available and make learni…

0301 basic medicineComputer sciencePlace cellMachine learningcomputer.software_genreSpatial memorySynthetic data03 medical and health sciencesCellular and Molecular Neuroscience0302 clinical medicineModels of neural computationGeneticsReinforcement learningMolecular BiologyEcology Evolution Behavior and SystematicsEcologybusiness.industryReservoir computingSnippet030104 developmental biologyComputational Theory and MathematicsModeling and SimulationSequence learningArtificial intelligencebusinesscomputer030217 neurology & neurosurgeryPLOS Computational Biology
researchProduct

SpCLUST: Towards a fast and reliable clustering for potentially divergent biological sequences

2019

International audience; This paper presents SpCLUST, a new C++ package that takes a list of sequences as input, aligns them with MUSCLE, computes their similarity matrix in parallel and then performs the clustering. SpCLUST extends a previously released software by integrating additional scoring matrices which enables it to cover the clustering of amino-acid sequences. The similarity matrix is now computed in parallel according to the master/slave distributed architecture, using MPI. Performance analysis, realized on two real datasets of 100 nucleotide sequences and 1049 amino-acids ones, show that the resulting library substantially outperforms the original Python package. The proposed pac…

0301 basic medicineComputer science[INFO.INFO-SE] Computer Science [cs]/Software Engineering [cs.SE]Health Informatics[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE][INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]0302 clinical medicineSoftware[INFO.INFO-ET] Computer Science [cs]/Emerging Technologies [cs.ET][INFO.INFO-DC] Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]Cluster AnalysisHumansCluster analysis[INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR]computer.programming_languagebusiness.industry[INFO.INFO-IU] Computer Science [cs]/Ubiquitous ComputingSimilarity matrixPattern recognitionDNAGenomicsSequence Analysis DNAPython (programming language)Mixture model[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationSpectral clusteringComputer Science Applications030104 developmental biologyComputingMethodologies_PATTERNRECOGNITION[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA][INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET][INFO.INFO-MA] Computer Science [cs]/Multiagent Systems [cs.MA][INFO.INFO-MO] Computer Science [cs]/Modeling and SimulationArtificial intelligence[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]businesscomputerAlgorithmsSoftware030217 neurology & neurosurgery
researchProduct

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

2016

This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…

0301 basic medicineCoprocessorComputer science0206 medical engineeringAccelerationData modelsSymmetric multiprocessor systemComputational modeling02 engineering and technologyParallel computingSupercomputer03 medical and health sciencesTask (computing)030104 developmental biologyCoprocessorsComputational Theory and MathematicsHardware and ArchitectureSignal ProcessingGeneticsPairwise comparisonComputer architectureGraphics processing units020602 bioinformaticsXeon Phi
researchProduct

On the Use of Binary Trees for DNA Hydroxymethylation Analysis

2017

DNA methylation (mC) and hydroxymethylation (hmC) can have a significant effect on normal human development, health and disease status. Hydroxymethylation studies require specific treatment of DNA, as well as software tools for their analysis. In this paper, we propose a parallel software tool for analyzing the DNA hydroxymethylation data obtained by TAB-seq. The software is based on the use of binary trees for searching the different occurrences of methylation and hydroxymethylation in DNA samples. The binary trees allow to efficiently store and access the information about the methylation of each methylated/hydroxymethylated cytosines in the samples. Evaluation results shows that the perf…

0301 basic medicineDNA Hydroxymethylation020203 distributed computingBinary treebusiness.industryComputer science02 engineering and technologyMethylationComputational biologySupercomputer03 medical and health scienceschemistry.chemical_compound030104 developmental biologySoftwareParallel softwarechemistryDNA methylation0202 electrical engineering electronic engineering information engineeringheterocyclic compoundsbusinessDNA
researchProduct

Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms

2018

Abstract Motivation Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in…

0301 basic medicineEpigenomicsgenomic analysis; hadoop; distributed computingStatistics and ProbabilityComputer scienceBig dataSequence assemblyGenomeBiochemistryDomain (software engineering)Set (abstract data type)03 medical and health sciencesdistributed computingSoftwareComputational Theory and MathematicAnimalsCluster AnalysisHumansA-DNAk-mer counting distributed computing hadoop map reduceMolecular BiologyEpigenomicsBacteriabusiness.industryk-mer countingEukaryotaLinguisticsComputer Science Applications1707 Computer Vision and Pattern RecognitionGenomicsSequence Analysis DNAComputer Science ApplicationsComputational Mathematics030104 developmental biologymap reduceComputational Theory and MathematicsDistributed algorithmgenomic analysisKernel (statistics)MetagenomehadoopbusinessAlgorithmAlgorithmsSoftware
researchProduct

S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light

2017

The availability and amount of sequenced genomes have been rapidly growing in recent years because of the adoption of next-generation sequencing (NGS) technologies that enable high-throughput short-read generation at highly competitive cost. Since this trend is expected to continue in the foreseeable future, the design and implementation of efficient and scalable NGS bioinformatics algorithms are important to research and industrial applications. In this paper, we introduce S-Aligner–a highly scalable read mapper designed for the Sunway Taihu Light supercomputer and its fourth-generationShenWei many-core architecture (SW26010). S-Aligner employs a combination of optimization techniques to o…

0301 basic medicineInstruction set03 medical and health sciences030104 developmental biologyXeonAsynchronous communicationComputer scienceMultithreadingScalabilitySIMDParallel computingSW26010Supercomputer2017 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Compendium of TCDD-mediated transcriptomic response datasets in mammalian model systems.

2017

Background 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) is the most potent congener of the dioxin class of environmental contaminants. Exposure to TCDD causes a wide range of toxic outcomes, ranging from chloracne to acute lethality. The severity of toxicity is highly dependent on the aryl hydrocarbon receptor (AHR). Binding of TCDD to the AHR leads to changes in transcription of numerous genes. Studies evaluating the transcriptional changes brought on by TCDD may provide valuable insight into the role of the AHR in human health and disease. We therefore compiled a collection of transcriptomic datasets that can be used to aid the scientific community in better understanding the transcriptiona…

0301 basic medicineMaleTCDDPolychlorinated DibenzodioxinsBioinformaticsMicroarray datasetsAHRWhite adipose tissueBiologyWeb BrowserProteomics413 Veterinary scienceMedical and Health SciencesCell LineTranscriptome03 medical and health sciencesMice0302 clinical medicineTranscription (biology)Information and Computing SciencesmedicineGeneticsAnimalsHumansheterocyclic compoundsGeneGeneticsGene Expression ProfilingRComputational BiologyBiological SciencesAryl hydrocarbon receptormedicine.disease3. Good healthRatsChloracnestomatognathic diseases030104 developmental biologyGene Expression Regulation030220 oncology & carcinogenesisAgent Orange & Dioxinbiology.proteinEnvironmental PollutantsFemaleDNA microarrayTranscriptomeSoftwareBiotechnology
researchProduct

Global emergence of the widespread Pseudomonas aeruginosa ST235 clone

2018

Abstract Objectives Despite the non-clonal epidemic population structure of Pseudomonas aeruginosa , several multi-locus sequence types are distributed worldwide and are frequently associated with epidemics where multidrug resistance confounds treatment. ST235 is the most prevalent of these widespread clones. In this study we aimed to understand the origin of ST235 and the molecular basis for its success. Methods The genomes of 79 P. aeruginosa ST235 isolates collected worldwide over a 27-year period were examined. A phylogenetic network was built, using a Bayesian approach to find the Most Recent Common Ancestor, and we identified antibiotic resistance determinants and ST235-specific genes…

0301 basic medicineMost recent common ancestorClone (cell biology)[ SDV.MP.BAC ] Life Sciences [q-bio]/Microbiology and Parasitology/Bacteriologymedicine.disease_causeGlobal HealthGenome[ SDV.MP ] Life Sciences [q-bio]/Microbiology and ParasitologyPrevalenceCluster Analysis[ SDV.BIBS ] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]High-risk clonesPhylogenyComputingMilieux_MISCELLANEOUSMolecular EpidemiologyGeneral Medicine3. Good healthInfectious Diseases[SDV.MP]Life Sciences [q-bio]/Microbiology and Parasitology[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA][ SDV.BBM.GTP ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Pseudomonas aeruginosaEfflux[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]FluoroquinolonesMicrobiology (medical)Genotype030106 microbiologyEpidemic[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]BiologyBacterial resistanceMicrobiology[INFO.INFO-IU]Computer Science [cs]/Ubiquitous ComputingEvolution Molecular03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]Antibiotic resistanceDrug Resistance BacterialmedicinePseudomonas InfectionsGenePseudomonas aeruginosaPathogenInternational clones[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationMultiple drug resistanceGenes Bacterial[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET]Multilocus Sequence Typing
researchProduct