Search results for "SIMD"

showing 10 items of 20 documents

MLP Neural Network Implementation on a SIMD Architecture

2002

An Automatic Road Sign Recognition System {A(RS)2} is aimed at detection and recognition of one or more road signs from realworld color images. The authors have proposed an A(RS)2 able to detect and extract sign regions from real world scenes on the basis of their color and shape features. Classification is then performed on extracted candidate regions using Multi-Layer Perceptron neural networks. Although system performances are good in terms of both sign detection and classification rates, the entire process requires a large computational time, so real-time applications are not allowed. In this paper we present the implementation of the neural layer on the Georgia Institute of Technology …

Digital imageArtificial neural networkPixelColor imageComputer sciencebusiness.industryPattern recognitionSIMDArtificial intelligencePerceptronbusinessSign (mathematics)
researchProduct

eISP, une architecture de calcul programmable pour l'amélioration d'images sur téléphone portable.

2009

4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC~65nm} technology to achieve a capacity of {16.8~GOPs} at {233~MHz}, for {1.5~mm$^2$} of silicon area and…

[ INFO.INFO-TS ] Computer Science [cs]/Signal and Image Processinglow power[INFO.INFO-TS] Computer Science [cs]/Signal and Image ProcessingCMOS[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingeISPSIMDvideo pipeimage processing[INFO.INFO-MC]Computer Science [cs]/Mobile ComputingMulti-SIMD[INFO.INFO-MC] Computer Science [cs]/Mobile Computing[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing[ INFO.INFO-MC ] Computer Science [cs]/Mobile Computing[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
researchProduct

Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi

2016

Advanced SIMD features on GPUs and Xeon Phis promote efficient long pattern search.A tiled approach to accelerating the Wu-Manber algorithm on GPUs has been proposed.Both the GPU and Xeon Phi yield two orders-of-magnitude speedup over one CPU core.The GPU-based version with tiling runs up to 2.9 × faster than the Xeon Phi version. Approximate pattern matching (APM) targets to find the occurrences of a pattern inside a subject text allowing a limited number of errors. It has been widely used in many application areas such as bioinformatics and information retrieval. Bit-parallel APM takes advantage of the intrinsic parallelism of bitwise operations inside a machine word. This approach typica…

020203 distributed computingSpeedupCoprocessorXeonComputer Networks and CommunicationsComputer science02 engineering and technologyParallel computingSupercomputerComputer Graphics and Computer-Aided DesignTheoretical Computer ScienceCUDAArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSIMDBitwise operationSoftwareWord (computer architecture)Xeon PhiParallel Computing
researchProduct

Real-time low level feature extraction for on-board robot vision systems

2006

Robot vision systems notoriously require large computing capabilities, rarely available on physical devices. Robots have limited embedded hardware, and almost all sensory computation is delegated to remote machines. Emerging gigascale integration technologies offer the opportunity to explore alternative computing architectures that can deliver a significant boost to on-board computing when implemented in embedded, reconfigurable devices. This paper explores the mapping of low level feature extraction on one such architecture, the Georgia Tech SIMD Pixel Processor (SIMPil). The Fast Boundary Web Extraction (fBWE) algorithm is adapted and mapped on SIMPil as a fixed-point, data parallel imple…

PixelComputer sciencebusiness.industryComputationvision systems real-timeFeature extractionNull (SQL)Computer architectureEmbedded systemRobotSIMDArchitectureUnconventional computingbusiness
researchProduct

eISP: a Programmable Processing Architecture for Smart Phone Image Enhancement

2009

4 pages; Today's smart phones, with their embedded high-resolution video sensors, require computing capacities that are too high to easily meet stringent silicon area and power consumption requirements (some one and a half square millimeters and half a watt) especially when programmable components are used. To develop such capacities, integrators still rely on dedicated low resolution video processing components, whose drawback is low flexibility. With this in mind, our paper presents eISP {--} a new, fully programmable Embedded Image Signal Processor architecture, now validated in {TSMC 65nm} technology to achieve a capacity of {16.8 GOPs} at {233 MHz}, for {1.5 mm$^2$} of silicon area and…

[ INFO.INFO-TS ] Computer Science [cs]/Signal and Image Processinglow power[INFO.INFO-TS] Computer Science [cs]/Signal and Image ProcessingCMOSdemosaïcking[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingeISPmm²SIMDimage processingvideo pipesmall siliconMulti-SIMDcomputing tilemilliwatt[INFO.INFO-TS]Computer Science [cs]/Signal and Image ProcessingsensordemosaicingTSMC 65nm[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
researchProduct

Impulse noise removal on an embedded, low memory SIMD processor

2003

Vector median filters efficiently reduce noise while preserving image details. However, their high computational complexity for color images makes them impractical for real-time systems. We propose new computationally efficient filtering algorithms, called index mapping filters (IMF). These filtering algorithms are accelerated by implementing them on a massively data parallel processor array. In addition to greater computational efficiency, these algorithms result in robust noise reduction of corrupted color images. Analyses of mean square error, signal-to-noise-ratio, and visual comparison metrics indicate that IMF are competitive with the vector median filter (VMF) in their ability to cor…

NoiseIndex mappingComputer scienceColor imageNoise reductionReal-time computingMedian filterFilter (signal processing)SIMDImpulse noiseAlgorithm2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628)
researchProduct

Portable Video Supercomputing

2004

As inexpensive imaging chips and wireless telecommunications are incorporated into an increasing array, of portable products, the need for high efficiency, high throughput embedded processing will become an important challenge in computer architecture. Videocentric applications, such wireless videoconferencing, real-time video enhancement and analysis, and new, immersive modes of distance education, will exceed the computational capabilities of current microprocessor and digital signal processor (DSP) architectures. A new class of embedded computers, portable video supercomputers, will combine supercomputer performance with the energy efficiency required for deployment in portable systems. …

Digital signal processorComputer scienceData parallelismVideo processingSupercomputerTheoretical Computer ScienceMicroarchitectureMPEG encodinglaw.inventionMicroprocessorComputational Theory and MathematicsComputer architectureHardware and ArchitecturelawSIMDSoftware
researchProduct

SWAPHI-LS: Smith-Waterman Algorithm on Xeon Phi coprocessors for Long DNA Sequences

2014

As an optimal method for sequence alignment, the Smith-Waterman (SW) algorithm is widely used. Unfortunately, this algorithm is computationally demanding, especially for long sequences. This has motivated the investigation of its acceleration on a variety of high-performance computing platforms. However, most work in the literature is only suitable for short sequences. In this paper, we present SWAPHI-LS, the first parallel SW algorithm exploiting emerging Xeon Phi coprocessors to accelerate the alignment of long DNA sequences. In SWAPHI-LS, we have investigated three parallelization approaches (naive, tiled, and distributed) in order to deeply explore the inherent parallelism within Xeon P…

Instruction setSmith–Waterman algorithmCoprocessorXeonComputer scienceData parallelismTask parallelismParallel computingSIMDIntrinsicsInstruction-level parallelismXeon Phi2014 IEEE International Conference on Cluster Computing (CLUSTER)
researchProduct

Massively Parallel ANS Decoding on GPUs

2019

In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results wit…

020203 distributed computingComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAScalability0202 electrical engineering electronic engineering information engineeringCodecSIMDEntropy encodingMassively parallelDecoding methodsData compressionProceedings of the 48th International Conference on Parallel Processing
researchProduct

Pairwise DNA Sequence Alignment Optimization

2015

This chapter presents a parallel implementation of the Smith-Waterman algorithm to accelerate the pairwise alignment of DNA sequences. This algorithm is especially computationally demanding for long DNA sequences. Parallelization approaches are examined in order to deeply explore the inherent parallelism within Intel Xeon Phi coprocessors. This chapter looks at exploiting instruction-level parallelism within 512-bit single instruction multiple data instructions (vectorization) as well as thread-level parallelism over the many cores (multithreading using OpenMP). Between coprocessors, device-level parallelism through the compute power of clusters including Intel Xeon Phi coprocessors using M…

CoprocessorComputer scienceMultithreadingVectorization (mathematics)Parallelism (grammar)SIMDParallel computingHardware_ARITHMETICANDLOGICSTRUCTURESComputerSystemsOrganization_PROCESSORARCHITECTURESIntrinsicsInstruction-level parallelismXeon Phi
researchProduct