Search results for "Parallel computing"
showing 10 items of 189 documents
An Embedded, FPGA-based Computer Graphics Coprocessor with Native Geometric Algebra Support
2009
The representation of geometric objects and their transformation are the two key aspects in computer graphics applications. Traditionally, computer-intensive matrix calculations are involved in modeling and rendering three-dimensional (3D) scenery. Geometric algebra (aka Clifford algebra) is attracting attention as a natural way to model geometric facts and as a powerful analytical tool for symbolic calculations. In this paper, the architecture of Clifford coprocessor (CliffoSor) is introduced. CliffoSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on a programmable gate array (FPGA) board is detailed…
A Dual-Core Coprocessor with Native 4D Clifford Algebra Support
2012
Geometric or Clifford Algebra (CA) is a powerful mathematical tool that is attracting a growing attention in many research fields such as computer graphics, computer vision, robotics and medical imaging for its natural and intuitive way to represent geometric objects and their transformations. This paper introduces the architecture of CliffordCoreDuo, an embedded dual-core coprocessor that offers direct hardware support to four-dimensional (4D) Clifford algebra operations. A prototype implementation on an FPGA board is detailed. Experimental results show a 1.6× average speedup of CliffordCoreDuo in comparison with the baseline mono-core architecture. A potential cycle speedup of about 40× o…
Accelerating Clifford Algebra Operations using GPUs and an OpenCL Code Generator
2015
Clifford Algebra (CA) is a powerful mathematical language that allows for a simple and intuitive representation of geometric objects and their transformations. It has important applications in many research fields, such as computer graphics, robotics, and machine vision. Direct hardware support of Clifford data types and operators is needed to accelerate applications based on Clifford Algebra. This paper proposes a mixed software-hardware system that exploits the computational power of Graphics Processing Units (GPUs) to accelerate Clifford operations. A code generator, namely OpenCLifford, is presented that automatically generates Java and C libraries for the direct support of Clifford ele…
An Evolution of the Non-Parameter Harris Affine Corner Detector: A Distributed Approach
2009
A parallel version of a new automatic Harris-based corner detector is presented. A scheduler to dynamically and homogeneously distribute high computational workload on heterogeneous parallel architectures such as Grid systems has been implemented to speedup the whole procedure. Experimental results show the robustness of the underlying scheduler, which can be easily exploited in various automatic image analysis systems.
Embedded Coprocessors for Native Execution of Geometric Algebra Operations
2016
Clifford algebra or geometric algebra (GA) is a simple and intuitive way to model geometric objects and their transformations. Operating in high-dimensional vector spaces with significant computational costs, the practical use of GA requires dedicated software and/or hardware architectures to directly support Clifford data types and operators. In this paper, a family of embedded coprocessors for the native execution of GA operations is presented. The paper shows the evolution of the coprocessor family focusing on the latest two architectures that offer direct hardware support to up to five-dimensional Clifford operations. The proposed coprocessors exploit hardware-oriented representations o…
Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences
2014
In this paper, we present a GPU-accelerated Smith-Waterman (SW) algorithm with Alignment Backtracking, called GSWAB, for short DNA sequences. This algorithm performs all-to-all pairwise alignments and retrieves optimal local alignments on CUDA-enabled GPUs. To facilitate fast alignment backtracking, we have investigated a tile-based SW implementation using the CUDA programming model. This tiled computing pattern enables us to more deeply explore the powerful compute capability of GPUs. We have evaluated the performance of GSWAB on a Kepler-based GeForce GTX Titan graphics card. The results show that GSWAB can achieve a performance of up to 56.8 GCUPS on large-scale datasets. Furthermore, ou…
SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors
2014
The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, mul…
GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences
2014
In this paper, we present GSWABE, a graphics processing unit GPU-accelerated pairwise sequence alignment algorithm for a collection of short DNA sequences. This algorithm supports all-to-all pairwise global, semi-global and local alignment, and retrieves optimal alignments on Compute Unified Device Architecture CUDA-enabled GPUs. All of the three alignment types are based on dynamic programming and share almost the same computational pattern. Thus, we have investigated a general tile-based approach to facilitating fast alignment by deeply exploring the powerful compute capability of CUDA-enabled GPUs. The performance of GSWABE has been evaluated on a Kepler-based Tesla K40 GPU using a varie…
Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures
2015
In this paper we present new parallelization techniques for searching large-scale biological sequence databases with the Smith-Waterman algorithm on Xeon Phi-based neoheterogenous architectures. In order to make full use of the compute power of both the multi-core CPU and the many-core Xeon Phi hardware, we use a collaborative computing scheme as well as hybrid parallelism. At the CPU side, we employ SSE intrinsics and multi-threading to implement SIMD parallelism. At the Xeon Phi side, we use Knights Corner vector instructions to gain more data parallelism. We have presented two dynamic task distribution schemes (thread level and device level) in order to achieve better load balancing. Fur…
Splitting the data cache: a survey
2000
Recent cache-memory research has focused on approaches that split the first-level data cache into two independent subcaches. The authors introduce a methodology for helping cache designers devise splitting schemes and survey a representative set of the published cache schemes.