Search results for "Speedup"

showing 10 items of 97 documents

Geomeasure: GIS and Scripting for Measuring Morphometric Variability

2019

This paper presents Geomeasure, a methodological tool developed to recover typometric information with a twofold objective. First, to speed up the process of gathering data by automatizing the way in which it is recovered. Second, it adds higher accuracy and the possibility of re-measuring archeological items without further directly interacting with the piece. Based on a combination of R scripting with GIS features, Geomeasure is at the time able to automatically gather 125–130 typometric variables per archaeological item, with the only input of vectorized photographs. It can be used as a reliable methodological aid to extract detailed information on patterns and trends of shape variabilit…

010506 paleontologyArcheologySpeedup060102 archaeologyComputer scienceProcess (computing)R Programming LanguageSample (statistics)06 humanities and the artscomputer.software_genre01 natural sciencesPerformance resultsScripting languageAnthropology0601 history and archaeologyData miningcomputer0105 earth and related environmental sciencesLithic Technology
researchProduct

Efficient Parallel Sort on AVX-512-Based Multi-Core and Many-Core Architectures

2019

Sorting kernels are a fundamental part of numerous applications. The performance of sorting implementations is usually limited by a variety of factors such as computing power, memory bandwidth, and branch mispredictions. In this paper we propose an efficient hybrid sorting method which takes advantage of wide vector registers and the high bandwidth memory of modern AVX-512-based multi-core and many-core processors. Our approach employs a combination of vectorized bitonic sorting and load-balanced multi-threaded merging. Thread-level and data-level parallelism are used to exploit both compute power and memory bandwidth. Our single-threaded implementation is ~30x faster than qsort in the C st…

020203 distributed computingBitonic sorterSpeedupComputer scienceRadix sortSortingMemory bandwidth02 engineering and technologyParallel computingBitonic sorting020202 computer hardware & architecture0202 electrical engineering electronic engineering information engineeringsortqsortMerge sortBranch mispredictionXeon Phi2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
researchProduct

Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi

2016

Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…

020203 distributed computingSpeedupCoprocessorXeonComputer Networks and CommunicationsComputer science02 engineering and technologyParallel computingSupercomputerComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceCUDAArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSIMDBitwise operationSoftwareWord (computer architecture)Xeon PhiParallel Computing
researchProduct

SWMapper: Scalable Read Mapper on SunWay TaihuLight

2020

With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data tra…

020203 distributed computingSpeedupXeonComputer scienceHash function020206 networking & telecommunications02 engineering and technologyParallel computingSupercomputerData structureDNA sequencingchemistry.chemical_compoundchemistryScalability0202 electrical engineering electronic engineering information engineeringDNASunway TaihuLight49th International Conference on Parallel Processing - ICPP
researchProduct

Collision detection for 3D rigid body motion planning with narrow passages

2017

In sampling-based 3D rigid body motion planning one of the major subroutines is collision detection. Especially for problems with narrow passages many samples have to be checked by a collision detection algorithm. In this application, the runtime of the motion planning algorithm is dominated by collision detection and the samples have the very specific characteristic that many of them are in collision and have small penetration volumes. In our work, we introduce a data structure and an algorithm that makes use of this characteristic by combining well-known data structures like a distance field and an octree with the swap algorithm by Llanas et al. For 3D rigid body motion planning with narr…

0209 industrial biotechnologySpeedupbusiness.industryComputer science02 engineering and technologyRigid bodyCollisionOctree020901 industrial engineering & automation0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCollision detectionComputer visionArtificial intelligenceMotion planningPhysics enginebusinessDistance transformAlgorithmComputingMethodologies_COMPUTERGRAPHICS2017 IEEE International Conference on Robotics and Automation (ICRA)
researchProduct

SPECTR

2018

Modern high throughput sequencing platforms can produce large amounts of short read DNA data at low cost. Error correction is an important but time-consuming initial step when processing this data in order to improve the quality of downstream analyses. In this paper, we present a Scalable Parallel Error CorrecToR designed to improve the throughput of DNA error correction for Illumina reads on various parallel platforms. Our design is based on a k-spectrum approach where a Bloom filter is frequently probed as a key operation and is optimized towards AVX-512-based multi-core CPUs, Xeon Phi many-cores (both KNC and KNL), and heterogeneous compute clusters. A number of architecture-specific opt…

0301 basic medicine03 medical and health sciencesMulti-core processor030104 developmental biologySpeedupXeonComputer scienceData structure alignmentParallel computingError detection and correctionSupercomputerThroughput (business)Xeon PhiProceedings of the 47th International Conference on Parallel Processing
researchProduct

parSRA: A framework for the parallel execution of short read aligners on compute clusters

2018

The growth of next generation sequencing datasets poses as a challenge to the alignment of reads to reference genomes in terms of both accuracy and speed. In this work we present parSRA, a parallel framework to accelerate the execution of existing short read aligners on distributed-memory systems. parSRA can be used to parallelize a variety of short read alignment tools installed in the system without any modification to their source code. We show that our framework provides good scalability on a compute cluster for accelerating the popular BWA-MEM and Bowtie2 aligners. On average, it is able to accelerate sequence alignments on 16 64-core nodes (in total, 1024 cores) with speedup of 10.48 …

0301 basic medicineSource codeSpeedupGeneral Computer ScienceComputer sciencemedia_common.quotation_subjectParallel computingSupercomputerTheoretical Computer Science03 medical and health sciences030104 developmental biology0302 clinical medicine030220 oncology & carcinogenesisModeling and SimulationComputer clusterScalabilityFuse (electrical)Node (circuits)Partitioned global address spacemedia_commonJournal of Computational Science
researchProduct

CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm

2015

Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…

0301 basic medicineSpeedupComputer scienceCorrelation clusteringParallel computingTheoretical Computer Science03 medical and health sciencesCUDA030104 developmental biologyHardware and ArchitectureCluster analysisAlgorithmSoftwareWard's methodThe International Journal of High Performance Computing Applications
researchProduct

Computing the Original eBWT Faster, Simpler, and with Less Memory

2021

Mantaci et al. [TCS 2007] defined the \(\mathrm {eBWT}\) to extend the definition of the \(\mathrm {BWT}\) to a collection of strings. However, since this introduction, it has been used more generally to describe any \(\mathrm {BWT}\) of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original \(\mathrm {eBWT}\), which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the \(\mathrm {BWT}\) of a single string that uses …

2019-20 coronavirus outbreakSpeedupString collectionsBig BWTSettore INF/01 - InformaticaSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2)String (computer science)Suffix arrayOrder (ring theory)omega-orderQuantitative Biology::GenomicsBurrows-Wheeler-TransformBurrows-Wheeler-Transform String collections SAIS Big BWT prefix-free parsing extended BWTlaw.inventionCombinatoricsprefix-free parsingSimple (abstract algebra)lawSAISSAIS algorithmIndependence (probability theory)extended BWTMathematics
researchProduct

FMapper: Scalable read mapper based on succinct hash index on SunWay TaihuLight

2022

Abstract One of the most important application in bioinformatics is read mapping. With the rapidly increasing number of reads produced by next-generation sequencing (NGS) technology, there is a need for fast and efficient high-throughput read mappers. In this paper, we present FMapper – a highly scalable read mapper on the TaihuLight supercomputer optimized for its fourth-generation ShenWei many-core architecture (SW26010). In order to fully exploit the computational power of the SW26010, we employ dynamic scheduling of tasks, asynchronous I/O and data transfers and implement a vectorized version of the banded Myers algorithm tailored to the 256 bit vector registers of the SW26010. Our perf…

256-bitSpeedupXeonComputer Networks and CommunicationsComputer scienceHash functionParallel computingSW26010SupercomputerTheoretical Computer ScienceArtificial IntelligenceHardware and ArchitectureScalabilitySoftwareSunway TaihuLightJournal of Parallel and Distributed Computing
researchProduct