Search results for "Coprocessor"

showing 10 items of 26 documents

Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi

2016

Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…

020203 distributed computingSpeedupCoprocessorXeonComputer Networks and CommunicationsComputer science02 engineering and technologyParallel computingSupercomputerComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceCUDAArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSIMDBitwise operationSoftwareWord (computer architecture)Xeon PhiParallel Computing

researchProduct

Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures

2016

This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…

0301 basic medicineCoprocessorComputer science0206 medical engineeringAccelerationData modelsSymmetric multiprocessor systemComputational modeling02 engineering and technologyParallel computingSupercomputer03 medical and health sciencesTask (computing)030104 developmental biologyCoprocessorsComputational Theory and MathematicsHardware and ArchitectureSignal ProcessingGeneticsPairwise comparisonComputer architectureGraphics processing units020602 bioinformaticsXeon Phi

researchProduct

mD3DOCKxb: An Ultra-Scalable CPU-MIC Coordinated Virtual Screening Framework

2017

Molecular docking is an important method in computational drug discovery. In large-scale virtual screening, millions of small drug-like molecules (chemical compounds) are compared against a designated target protein (receptor). Depending on the utilized docking algorithm for screening, this can take several weeks on conventional HPC systems. However, for certain applications including large-scale screening tasks for newly emerging infectious diseases such high runtimes can be highly prohibitive. In this paper, we investigate how the massively parallel neo-heterogeneous architecture of Tianhe-2 Supercomputer consisting of thousands of nodes comprising CPUs and MIC coprocessors that can effic…

0301 basic medicineVirtual screeningMulti-core processorCoprocessorComputer sciencebusiness.industryParallel computingSupercomputer03 medical and health sciences030104 developmental biologyEmbedded systemScalabilityTianhe-2Algorithm designbusinessMassively parallel2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

researchProduct

An FPGA-Based Adaptive Fuzzy Coprocessor

2005

The architecture of a general purpose fuzzy logic coprocessor and its implementation on an FPGA based System on Chip is described. Thanks to its ability to support a fast dynamic reconfiguration of all its parameters, it is suitable for implementing adaptive fuzzy logic algorithms, or for the execution of different fuzzy algorithms in a time sharing fashion. The high throughput obtained using a pipelined structure and the efficient data organization allows significant increase of the computational capabilities strongly desired in applications with hard real-time constraints.

Adaptive neuro fuzzy inference systemfuzzy inferenceCoprocessorAdaptive algorithmbusiness.industryComputer scienceMembership functionsControl reconfigurationSettore ING-INF/01 - ElettronicaFuzzy logicFuzzy logicFuzzy electronicsComputer Science::Hardware ArchitectureEmbedded systembusinessThroughput (business)Membership function

researchProduct

Cell-List based Molecular Dynamics on Many-Core Processors: A Case Study on Sunway TaihuLight Supercomputer

2020

Molecular dynamics (MD) simulations are playing an increasingly important role in several research areas. The most frequently used potentials in MD simulations are pair-wise potentials. Due to the memory wall, computing pair-wise potentials on many-core processors are usually memory bounded. In this paper, we take the SW26010 processor as an exemplary platform to explore the possibility to break the memory bottleneck by improving data reusage via cell-list-based methods. We use cell-lists instead of neighbor-lists in the potential computation, and apply a number of novel optimization methods. Theses methods include: an adaptive replica arrangement strategy, a parameter profile data structur…

CoprocessorCell lists010304 chemical physicsComputer scienceReplica020207 software engineering02 engineering and technologyParallel computingSupercomputerData structure01 natural sciencesBottleneckMolecular dynamics0103 physical sciencesScalability0202 electrical engineering electronic engineering information engineeringSunway TaihuLightSC20: International Conference for High Performance Computing, Networking, Storage and Analysis

researchProduct

Pairwise DNA Sequence Alignment Optimization

2015

This chapter presents a parallel implementation of the Smith-Waterman algorithm to accelerate the pairwise alignment of DNA sequences. This algorithm is especially computationally demanding for long DNA sequences. Parallelization approaches are examined in order to deeply explore the inherent parallelism within Intel Xeon Phi coprocessors. This chapter looks at exploiting instruction-level parallelism within 512-bit single instruction multiple data instructions (vectorization) as well as thread-level parallelism over the many cores (multithreading using OpenMP). Between coprocessors, device-level parallelism through the compute power of clusters including Intel Xeon Phi coprocessors using M…

CoprocessorComputer scienceMultithreadingVectorization (mathematics)Parallelism (grammar)SIMDParallel computingHardware_ARITHMETICANDLOGICSTRUCTURESComputerSystemsOrganization_PROCESSORARCHITECTURESIntrinsicsInstruction-level parallelismXeon Phi

researchProduct

Smart camera based on an Embedded HW/SW Co-Processor

2008

Abstract This paper describes an image acquisition and a processing system based on a new coprocessor architecture designed for CMOS sensor imaging. The system exploits the full potential CMOS selective access imaging technology because the coprocessor unit is integrated into the image acquisition loop. The acquisition and coprocessing architecture are compatible with the majority of CMOS sensors. It enables the dynamic selection of a wide variety of acquisition modes as well as the reconfiguration and implementation of high-performance image preprocessing algorithms (calibration, filtering, denoising, binarization, pattern recognition). Furthermore, the processing and data transfer, from t…

CoprocessorGeneral Computer ScienceComputer sciencelcsh:TK7800-836002 engineering and technology0202 electrical engineering electronic engineering information engineeringSmart camera[ INFO.INFO-ES ] Computer Science [cs]/Embedded SystemsField-programmable gate arrayComputingMilieux_MISCELLANEOUSFPGACMOS sensorSmart Camerabusiness.industry020208 electrical & electronic engineeringlcsh:ElectronicsACMControl reconfiguration020206 networking & telecommunicationsModular designco-processorCMOSControl and Systems EngineeringEmbedded systemPattern recognition (psychology)embedded processing[INFO.INFO-ES]Computer Science [cs]/Embedded Systemsbusinesspostal sortingComputer hardwareComputer Science(all)

researchProduct

FPGA-based concurrent watchdog for real-time control systems

2003

A straightforward and efficient implementation of a custom concurrent watchdog processor for real-time control systems is presented. Emphasis is given to the techniques used for on-line checking the main processor activity without adding overhead, and to the advantages of a field programmable gate array implementation.

Coprocessorbusiness.industryComputer scienceFPGA Fault tolerant systemsSettore ING-INF/01 - ElettronicaProgrammable logic arrayConcurrency controlReal-time Control SystemEmbedded systemControl systemOverhead (computing)Digital controlElectrical and Electronic EngineeringbusinessField-programmable gate arrayElectronics Letters

researchProduct

Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor

2014

Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…

Instruction setCoprocessorSpeedupComputer scienceParallel computingPattern matchingIntrinsicsWord (computer architecture)Xeon PhiVector processor2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing

researchProduct

SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences

2014

As an optimal method for sequence alignment, the Smith-Waterman (SW) algorithm is widely used. Unfortunately, this algorithm is computationally demanding, especially for long sequences. This has motivated the investigation of its acceleration on a variety of high-performance computing platforms. However, most work in the literature is only suitable for short sequences. In this paper, we present SWAPHI-LS, the first parallel SW algorithm exploiting emerging Xeon Phi coprocessors to accelerate the alignment of long DNA sequences. In SWAPHI-LS, we have investigated three parallelization approaches (naive, tiled, and distributed) in order to deeply explore the inherent parallelism within Xeon P…

Instruction setSmith–Waterman algorithmCoprocessorXeonComputer scienceData parallelismTask parallelismParallel computingSIMDIntrinsicsInstruction-level parallelismXeon Phi2014 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct