Search results for "Parallel computing"

showing 10 items of 189 documents

BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures

2018

Abstract Motivation Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. Results BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance perf…

Statistics and Probability0303 health sciencesMulti-core processorXeonComputer sciencebusiness.industry030302 biochemistry & molecular biologySequence alignmentSequence Analysis DNAParallel computingBiochemistryComputer Science Applications03 medical and health sciencesComputational MathematicsTitan (supercomputer)SoftwareComputational Theory and MathematicsEdit distancebusinessSequence AlignmentMolecular BiologyAlgorithmsSoftwareXeon Phi030304 developmental biologyBioinformatics

researchProduct

A parallel and sensitive software tool for methylation analysis on multicore platforms.

2015

Abstract Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bis…

Statistics and ProbabilityMutation rateTime FactorsComputer scienceReal-time computingBisulfite sequencingMolecular Sequence DataGenomicsParallel computingcomputer.software_genremedicine.disease_causeBiochemistryGenomeBottleneckchemistry.chemical_compoundSoftwareMutation RateDatabases GeneticmedicineHumansSulfitesMolecular BiologyMutationMulti-core processorGenomeBase Sequencebusiness.industryHigh-Throughput Nucleotide SequencingMethylationGenomicsDNA MethylationOriginal PapersComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicschemistryDNA methylationScalabilityMutationCompilerbusinesscomputerSequence AnalysisDNAAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct

RabbitMash: accelerating hash-based genome analysis on modern multi-core architectures

2020

Abstract Motivation Mash is a popular hash-based genome analysis toolkit with applications to important downstream analyses tasks such as clustering and assembly. However, Mash is currently not able to fully exploit the capabilities of modern multi-core architectures, which in turn leads to high runtimes for large-scale genomic datasets. Results We present RabbitMash, an efficient highly optimized implementation of Mash which can take full advantage of modern hardware including multi-threading, vectorization and fast I/O. We show that our approach achieves speedups of at least 1.3, 9.8, 8.5 and 4.4 compared to Mash for the operations sketch, dist, triangle and screen, respectively. Furtherm…

Statistics and ProbabilityWorkstationExploitComputer scienceHash functionParallel computingBiochemistrylaw.invention03 medical and health sciencesSoftwarelawCluster analysisMolecular Biology030304 developmental biology0303 health sciencesMulti-core processorGenomeComputersbusiness.industry030302 biochemistry & molecular biologyGenomicsSketchComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsbusinessAlgorithmsSoftwareBioinformatics

researchProduct

Analysis of the influence of processor hidden registers on the accuracy of fault injection techniques

2004

Modern processors tend to increase the number of registers, being part of them not accessible by the instruction set. Traditionally, the effect of faults in these hidden registers has not been considered during system validation using fault injection. In this paper, a study of the importance of faults in hidden registers is performed. Firstly, we have analysed the sensitivity of hidden registers to faults in combinational logic. In a second phase, we have analysed the impact of the faults occurred in hidden registers on system behaviour. A broad set of permanent and transient faults have been injected into the models of two typical commercial microcontrollers, using a VHDL-based fault injec…

Stuck-at faultInstruction setCombinational logicComputer scienceFault coverageVHDLHardware description languageHardware_PERFORMANCEANDRELIABILITYParallel computingFault injectionFault modelcomputercomputer.programming_languageProceedings. Ninth IEEE International High-Level Design Validation and Test Workshop (IEEE Cat. No.04EX940)

researchProduct

Parallel Computing for the study of the focusing Davey-Stewartson II equation in semiclassical limit

2012

The asymptotic description of the semiclassical limit of nonlinear Schrödinger equations is a major challenge with so far only scattered results in 1 + 1 dimensions. In this limit, solutions to the NLS equations can have zones of rapid modulated oscillations or blow up. We numerically study in this work the Davey-Stewartson system, a 2 + 1 dimensional nonlinear Schrödinger equation with a nonlocal term, by using parallel computing. This leads to the first results on the semiclassical limit for the Davey-Stewartson equations.

T57-57.97Work (thermodynamics)Applied mathematics. Quantitative methods010102 general mathematicsOne-dimensional spaceMathematics::Analysis of PDEsSemiclassical physics010103 numerical & computational mathematicsParallel computing01 natural sciencesSchrödinger equationsymbols.namesakeNonlinear systemNonlinear Sciences::Exactly Solvable and Integrable SystemsQA1-939symbolsLimit (mathematics)0101 mathematicsNonlinear Sciences::Pattern Formation and SolitonsNonlinear Schrödinger equationMathematicsMathematicsESAIM: Proceedings

researchProduct

Optimal Configuration for N-Dimensional Twin Torus Networks

2014

Torus topology is one of the most common topologies used in the current largest supercomputers. Although 3D torus is widely used, recently some supercomputers in the Top500 list have been built using networks with topologies of five or six dimensions. To obtain an nD torus, 2n ports per node are needed. These ports can be offered by a single or several cards per node. In the second case, there are multiple ways of assigning the dimension and direction of the card ports. In a previous work we proposed the 3D Twin (3DT) torus which uses two 4-port cards per node, and obtained the optimal port configuration. This paper extends and generalizes that work in order to obtain the optimal port confi…

TOP500ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATIONComputer scienceDimension (graph theory)Node (circuits)Topology (electrical circuits)Algorithm designTorusParallel computingRouting (electronic design automation)Network topologyTopologyComputer Science::Operating Systems2014 IEEE 13th International Symposium on Network Computing and Applications

researchProduct

Empirical Autotuning of Two-level Parallel Linear Algebra Routines on Large cc-NUMA Systems

2012

In large cc-NUMA systems the efficient use of the different levels of the memory hierarchy is not an easy task, and the performance of multithreading implementations of the libraries decreases when the number of cores used increases, so producing an important lost of efficiency. To alleviate this problem, routines with multilevel parallelism can be developed by combining OpenMP and BLAS parallelism. In that way, higher performance can be achieved, but it is necessary to develop some autotuning technique for the appropriate selection of the number of threads to use at each level. The selection can be made through theoretical models of the execution time or some installation methodology. This…

Task (computing)Selection (relational algebra)Memory hierarchyComputer scienceMultithreadingLinear algebraParallelism (grammar)Parallel computingTemporal multithreadingMatrix multiplication2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications

researchProduct

Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

2012

Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time com…

Theoretical computer scienceBurrows–Wheeler transformComputational complexity theoryComputer scienceComputational genomicsParallel computingData structureTime complexityHash table2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

researchProduct

Work Partitioning on Parallel and Distributed Agent-Based Simulation

2017

Work partitioning is a key challenge with ap- plications in many scientific and technological fields. The problem is very well studied with a rich literature on both distributed and parallel computing architectures. In this paper we deal with the work partitioning problem for parallel and distributed agent-based simulations which aims at (i) balancing the overall load distribution, (ii) minimizing, at the same time, the communication overhead due to agents' inter-dependencies. We introduce a classification taxonomy of work partitioning strategies and present a space-based work partitioning ap- proach, based on a Quad-tree data structure, which enables to: identify a good space partitioning …

Theoretical computer scienceComputational complexity theoryComputer Networks and CommunicationsComputer scienceDistributed computingContext (language use)02 engineering and technologyParallel ComputingSynchronization (computer science)0202 electrical engineering electronic engineering information engineeringOverhead (computing)Space partitioningAgent-based simulation020203 distributed computingAgent-based simulations; D-MASON; Distributed Systems; Parallel Computing; Work partitioning; Hardware and Architecture; Computer Networks and Communications; Information SystemsFlocking (behavior)Agent-based simulations020206 networking & telecommunicationsWork partitioningData structureDistributed SystemComputer Networks and CommunicationD-MASONDistributed SystemsHardware and ArchitectureBoidsInformation Systems2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

researchProduct

Parallel Collision Queries on the GPU

2013

We present parallel algorithms to accelerate collision tests of rigid body objects for a high number of independent transformations as they occur in sampling-based motion planning and path validation problems. We compare various GPU approaches with a different level of parallelism against each other and against a parallel CPU implementation. Our algorithms require no sophisticated load balancing schemes. They make no assumption on the distribution of the input transformations and require no pre-processing. Yet, we can perform up to 1 million collision tests per second with our best GPU implementation in our benchmarks. This is about 2.5X faster than our reference multi-core CPU implementati…

Theoretical computer scienceShared memoryComputer scienceParallel algorithmCollision detectionParallel computingMotion planningLoad balancing (computing)CollisionRigid bodyImplementation

researchProduct