Search results for "SIMD"
showing 10 items of 20 documents
MLP Neural Network Implementation on a SIMD Architecture
2002
An Automatic Road Sign Recognition System {A(RS)2} is aimed at detection and recognition of one or more road signs from realworld color images. The authors have proposed an A(RS)2 able to detect and extract sign regions from real world scenes on the basis of their color and shape features. Classification is then performed on extracted candidate regions using Multi-Layer Perceptron neural networks. Although system performances are good in terms of both sign detection and classification rates, the entire process requires a large computational time, so real-time applications are not allowed. In this paper we present the implementation of the neural layer on the Georgia Institute of Technology …
eISP, une architecture de calcul programmable pour l'amélioration d'images sur téléphone portable.
2009
4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC~65nm} technology to achieve a capacity of {16.8~GOPs} at {233~MHz}, for {1.5~mm$^2$} of silicon area and…
Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi
2016
Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…
Real-time low level feature extraction for on-board robot vision systems
2006
Robot vision systems notoriously require large computing capabilities, rarely available on physical devices. Robots have limited embedded hardware, and almost all sensory computation is delegated to remote machines. Emerging gigascale integration technologies offer the opportunity to explore alternative computing architectures that can deliver a significant boost to on-board computing when implemented in embedded, reconfigurable devices. This paper explores the mapping of low level feature extraction on one such architecture, the Georgia Tech SIMD Pixel Processor (SIMPil). The Fast Boundary Web Extraction (fBWE) algorithm is adapted and mapped on SIMPil as a fixed-point, data parallel imple…
eISP: a Programmable Processing Architecture for Smart Phone Image Enhancement
2009
4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC 65nm} technology to achieve a capacity of {16.8 GOPs} at {233 MHz}, for {1.5 mm$^2$} of silicon area and…
Impulse noise removal on an embedded, low memory SIMD processor
2003
Vector median filters efficiently reduce noise while preserving image details. However, their high computational complexity for color images makes them impractical for real-time systems. We propose new computationally efficient filtering algorithms, called index mapping filters (IMF). These filtering algorithms are accelerated by implementing them on a massively data parallel processor array. In addition to greater computational efficiency, these algorithms result in robust noise reduction of corrupted color images. Analyses of mean square error, signal-to-noise-ratio, and visual comparison metrics indicate that IMF are competitive with the vector median filter (VMF) in their ability to cor…
Portable Video Supercomputing
2004
As inexpensive imaging chips and wireless telecommunications are incorporated into an increasing array, of portable products, the need for high efficiency, high throughput embedded processing will become an important challenge in computer architecture. Videocentric applications, such wireless videoconferencing, real-time video enhancement and analysis, and new, immersive modes of distance education, will exceed the computational capabilities of current microprocessor and digital signal processor (DSP) architectures. A new class of embedded computers, portable video supercomputers, will combine supercomputer performance with the energy efficiency required for deployment in portable systems. …
SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences
2014
As an optimal method for sequence alignment, the Smith-Waterman (SW) algorithm is widely used. Unfortunately, this algorithm is computationally demanding, especially for long sequences. This has motivated the investigation of its acceleration on a variety of high-performance computing platforms. However, most work in the literature is only suitable for short sequences. In this paper, we present SWAPHI-LS, the first parallel SW algorithm exploiting emerging Xeon Phi coprocessors to accelerate the alignment of long DNA sequences. In SWAPHI-LS, we have investigated three parallelization approaches (naive, tiled, and distributed) in order to deeply explore the inherent parallelism within Xeon P…
Massively Parallel ANS Decoding on GPUs
2019
In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results wit…
Pairwise DNA Sequence Alignment Optimization
2015
This chapter presents a parallel implementation of the Smith-Waterman algorithm to accelerate the pairwise alignment of DNA sequences. This algorithm is especially computationally demanding for long DNA sequences. Parallelization approaches are examined in order to deeply explore the inherent parallelism within Intel Xeon Phi coprocessors. This chapter looks at exploiting instruction-level parallelism within 512-bit single instruction multiple data instructions (vectorization) as well as thread-level parallelism over the many cores (multithreading using OpenMP). Between coprocessors, device-level parallelism through the compute power of clusters including Intel Xeon Phi coprocessors using M…