Search results for "SIMD"
showing 10 items of 20 documents
PNeuro: A scalable energy-efficient programmable hardware accelerator for neural networks
2018
Proceedings of a meeting held 19-23 March 2018, Dresden, Germany; International audience; Artificial intelligence and especially Machine Learning recently gained a lot of interest from the industry. Indeed, new generation of neural networks built with a large number of successive computing layers enables a large amount of new applications and services implemented from smart sensors to data centers. These Deep Neural Networks (DNN) can interpret signals to recognize objects or situations to drive decision processes. However, their integration into embedded systems remains challenging due to their high computing needs. This paper presents PNeuro, a scalable energy-efficient hardware accelerat…
Impulse noise removal on an embedded, low memory SIMD processor
2003
Vector median filters efficiently reduce noise while preserving image details. However, their high computational complexity for color images makes them impractical for real-time systems. We propose new computationally efficient filtering algorithms, called index mapping filters (IMF). These filtering algorithms are accelerated by implementing them on a massively data parallel processor array. In addition to greater computational efficiency, these algorithms result in robust noise reduction of corrupted color images. Analyses of mean square error, signal-to-noise-ratio, and visual comparison metrics indicate that IMF are competitive with the vector median filter (VMF) in their ability to cor…
Low Level Languages for the PAPIA Machine
1986
The paper presents the low-level languages implemented up to date to program the PAPIA machine. The parallel assembly-level P-MAGRO package, the microcode level instruction set and a machine simulating environment are described.
The impact of grain size on the efficiency of embedded SIMD image processing architectures
2004
Pixel-per-processing element (PPE) ratio-the amount of image data directly mapped to each processing element-has a significant impact on the area and energy efficiency of embedded SIMD architectures for image processing applications. This paper quantitatively evaluates the impact of PPE ratio on system performance and efficiency for focal-plane SIMD image processing architectures by comparing throughput, area efficiency, and energy efficiency for a range of common application kernels using architectural and workload simulation. While the impact of grain size is affected by the mix of executed instructions within an application program, the most efficient PPE ratio often does not occur at PE…
Real-time low level feature extraction for on-board robot vision systems
2006
Robot vision systems notoriously require large computing capabilities, rarely available on physical devices. Robots have limited embedded hardware, and almost all sensory computation is delegated to remote machines. Emerging gigascale integration technologies offer the opportunity to explore alternative computing architectures that can deliver a significant boost to on-board computing when implemented in embedded, reconfigurable devices. This paper explores the mapping of low level feature extraction on one such architecture, the Georgia Tech SIMD Pixel Processor (SIMPil). The Fast Boundary Web Extraction (fBWE) algorithm is adapted and mapped on SIMPil as a fixed-point, data parallel imple…
SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors
2014
The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, mul…
Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures
2015
In this paper we present new parallelization techniques for searching large-scale biological sequence databases with the Smith-Waterman algorithm on Xeon Phi-based neoheterogenous architectures. In order to make full use of the compute power of both the multi-core CPU and the many-core Xeon Phi hardware, we use a collaborative computing scheme as well as hybrid parallelism. At the CPU side, we employ SSE intrinsics and multi-threading to implement SIMD parallelism. At the Xeon Phi side, we use Knights Corner vector instructions to gain more data parallelism. We have presented two dynamic task distribution schemes (thread level and device level) in order to achieve better load balancing. Fur…
eISP, une architecture de calcul programmable pour l'amélioration d'images sur téléphone portable.
2009
4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC~65nm} technology to achieve a capacity of {16.8~GOPs} at {233~MHz}, for {1.5~mm$^2$} of silicon area and…
eISP: a Programmable Processing Architecture for Smart Phone Image Enhancement
2009
4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC 65nm} technology to achieve a capacity of {16.8 GOPs} at {233 MHz}, for {1.5 mm$^2$} of silicon area and…
Perfect Hashing Structures for Parallel Similarity Searches
2015
International audience; Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementa- tion. We propose a 2-stage index structure which is based on neighborhood indexing and perfect hashing techniques. This structure performs a filtering phase over the neighborhood regions around the seeds in constant time and avoid as much as possible random memory accesses and branch divergences. Moreover, it fits particularly well on parallel SIMD processors, because it requ…