Search results for "SIMD"

showing 10 items of 20 documents

PNeuro: A scalable energy-efficient programmable hardware accelerator for neural networks

2018

Proceedings of a meeting held 19-23 March 2018, Dresden, Germany; International audience; Artificial intelligence and especially Machine Learning recently gained a lot of interest from the industry. Indeed, new generation of neural networks built with a large number of successive computing layers enables a large amount of new applications and services implemented from smart sensors to data centers. These Deep Neural Networks (DNN) can interpret signals to recognize objects or situations to drive decision processes. However, their integration into embedded systems remains challenging due to their high computing needs. This paper presents PNeuro, a scalable energy-efficient hardware accelerat…

Neural network hardwareComputer sciencePooling02 engineering and technologyLow power0202 electrical engineering electronic engineering information engineeringSIMDField-programmable gate arrayFPGAComputer architecturesRoutingArtificial neural networkASIC[SCCO.NEUR]Cognitive science/Neuroscience020208 electrical & electronic engineering[SCCO.NEUR] Cognitive science/NeuroscienceField programmable gate arraysConvolution020202 computer hardware & architectureGeneratorsComputer architectureScalabilityHardware accelerationRouting (electronic design automation)Neural networksEfficient energy use

researchProduct

Impulse noise removal on an embedded, low memory SIMD processor

2003

Vector median filters efficiently reduce noise while preserving image details. However, their high computational complexity for color images makes them impractical for real-time systems. We propose new computationally efficient filtering algorithms, called index mapping filters (IMF). These filtering algorithms are accelerated by implementing them on a massively data parallel processor array. In addition to greater computational efficiency, these algorithms result in robust noise reduction of corrupted color images. Analyses of mean square error, signal-to-noise-ratio, and visual comparison metrics indicate that IMF are competitive with the vector median filter (VMF) in their ability to cor…

NoiseIndex mappingComputer scienceColor imageNoise reductionReal-time computingMedian filterFilter (signal processing)SIMDImpulse noiseAlgorithm2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628)

researchProduct

Low Level Languages for the PAPIA Machine

1986

The paper presents the low-level languages implemented up to date to program the PAPIA machine. The parallel assembly-level P-MAGRO package, the microcode level instruction set and a machine simulating environment are described.

PAPIA Language Architecture SIMD Processor Parallel-CScalar processorComputer scienceVirtual machineProgramming languageSimd processorParallel computingArchitecturePyramid algorithmcomputer.software_genreLow-level programming languagecomputer

researchProduct

The impact of grain size on the efﬁciency of embedded SIMD image processing architectures

2004

Pixel-per-processing element (PPE) ratio-the amount of image data directly mapped to each processing element-has a significant impact on the area and energy efficiency of embedded SIMD architectures for image processing applications. This paper quantitatively evaluates the impact of PPE ratio on system performance and efficiency for focal-plane SIMD image processing architectures by comparing throughput, area efficiency, and energy efficiency for a range of common application kernels using architectural and workload simulation. While the impact of grain size is affected by the mix of executed instructions within an application program, the most efficient PPE ratio often does not occur at PE…

PixelComputer Networks and CommunicationsComputer scienceProcessor grain sizeImage processingParallel computingEnergy technologyenergy and area efficiencyGrain sizeSIMDTheoretical Computer Scienceimage processingParallel processing (DSP implementation)technology modelingArtificial IntelligenceHardware and ArchitectureRetargetingSIMDThroughput (business)SoftwareEfficient energy use

researchProduct

Real-time low level feature extraction for on-board robot vision systems

2006

Robot vision systems notoriously require large computing capabilities, rarely available on physical devices. Robots have limited embedded hardware, and almost all sensory computation is delegated to remote machines. Emerging gigascale integration technologies offer the opportunity to explore alternative computing architectures that can deliver a significant boost to on-board computing when implemented in embedded, reconfigurable devices. This paper explores the mapping of low level feature extraction on one such architecture, the Georgia Tech SIMD Pixel Processor (SIMPil). The Fast Boundary Web Extraction (fBWE) algorithm is adapted and mapped on SIMPil as a fixed-point, data parallel imple…

PixelComputer sciencebusiness.industryComputationvision systems real-timeFeature extractionNull (SQL)Computer architectureEmbedded systemRobotSIMDArchitectureUnconventional computingbusiness

researchProduct

SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

2014

The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, mul…

Smith–Waterman algorithmFOS: Computer and information sciencesMulti-core processorCoprocessorSpeedupSequence databaseComputer scienceParallel computingIntrinsicsComputer Science - Distributed Parallel and Cluster ComputingScalabilitySIMDDistributed Parallel and Cluster Computing (cs.DC)Xeon Phi

researchProduct

Accelerating large-scale biological database search on Xeon Phi-based neo-heterogeneous architectures

2015

In this paper we present new parallelization techniques for searching large-scale biological sequence databases with the Smith-Waterman algorithm on Xeon Phi-based neoheterogenous architectures. In order to make full use of the compute power of both the multi-core CPU and the many-core Xeon Phi hardware, we use a collaborative computing scheme as well as hybrid parallelism. At the CPU side, we employ SSE intrinsics and multi-threading to implement SIMD parallelism. At the Xeon Phi side, we use Knights Corner vector instructions to gain more data parallelism. We have presented two dynamic task distribution schemes (thread level and device level) in order to achieve better load balancing. Fur…

Smith–Waterman algorithmXeonComputer scienceData parallelismHyper-threadingSIMDParallel computingCentral processing unitComputerSystemsOrganization_PROCESSORARCHITECTURESIntrinsicsXeon Phi2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

researchProduct

eISP, une architecture de calcul programmable pour l'amélioration d'images sur téléphone portable.

2009

4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC~65nm} technology to achieve a capacity of {16.8~GOPs} at {233~MHz}, for {1.5~mm$^2$} of silicon area and…

[ INFO.INFO-TS ] Computer Science [cs]/Signal and Image Processinglow power[INFO.INFO-TS] Computer Science [cs]/Signal and Image ProcessingCMOS[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingeISPSIMDvideo pipeimage processing[INFO.INFO-MC]Computer Science [cs]/Mobile ComputingMulti-SIMD[INFO.INFO-MC] Computer Science [cs]/Mobile Computing[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing[ INFO.INFO-MC ] Computer Science [cs]/Mobile Computing[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing

researchProduct

eISP: a Programmable Processing Architecture for Smart Phone Image Enhancement

2009

4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC 65nm} technology to achieve a capacity of {16.8 GOPs} at {233 MHz}, for {1.5 mm$^2$} of silicon area and…

[ INFO.INFO-TS ] Computer Science [cs]/Signal and Image Processinglow power[INFO.INFO-TS] Computer Science [cs]/Signal and Image ProcessingCMOSdemosaïcking[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingeISPmm²SIMDimage processingvideo pipesmall siliconMulti-SIMDcomputing tilemilliwatt[INFO.INFO-TS]Computer Science [cs]/Signal and Image ProcessingsensordemosaicingTSMC 65nm[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing

researchProduct

Perfect Hashing Structures for Parallel Similarity Searches

2015

International audience; Seed-based heuristics have proved to be efficient for studying similarity between genetic databases with billions of base pairs. This paper focuses on algorithms and data structures for the filtering phase in seed-based heuristics, with an emphasis on efficient parallel GPU/manycores implementa- tion. We propose a 2-stage index structure which is based on neighborhood indexing and perfect hashing techniques. This structure performs a filtering phase over the neighborhood regions around the seeds in constant time and avoid as much as possible random memory accesses and branch divergences. Moreover, it fits particularly well on parallel SIMD processors, because it requ…

researchProduct