Search results for "Parallel"

showing 10 items of 667 documents

One-dimensional hydrodynamic modeling of coronal plasmas on transputer arrays

1990

Abstract We describe a concurrent implementation of the Palermo-Harvard hydrodynamic code on cost-effective and modularity expandable transputer arrays. We have tested the effectiveness of our approach by simulating an already well-studied compact solar-flare model on different transputer configurations and compared their performances with those of other machines. We have found that the speed of the concurrent program on a 16-T800 transputers array is ~1/9 of that of the equivalent code optimized for a CRAY X-MP/48. This work clearly shows that transputer-based arrays provide locally available high computing-power tools to extend the investigation of compact solar flares and similar astroph…

Modularity (networks)Partial differential equationComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATIONParallel processing (DSP implementation)Hardware and ArchitectureComputer scienceTransputerCode (cryptography)General Physics and AstronomyPlasmaComputerSystemsOrganization_PROCESSORARCHITECTURESAlgorithmComputational scienceComputer Physics Communications

researchProduct

Motion analysis using the novelty filter

1991

Abstract An original approach to the motion analysis, based on the novelty filter, is proposed. The novelty filter stresses the novelties occurring in a pattern representing an image of the scene under consideration with respect to patterns representing previous images of the same scene, so that visual information about the motion of the objects is obtained. The novelty filter may be implemented by a neural network architecture, taking advantage of the capabilities of massive parallelism, adaptive learning and noise robustness. The novelty filter may learn the entire trajectory of an object, through an incremental learning of a sequence of images capturing the scene, thus emphasizing if the…

Motion analysisArtificial neural networkbusiness.industryComputer scienceComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONNoveltyImage processingFilter (signal processing)Artificial IntelligenceRobustness (computer science)Computer Science::Computer Vision and Pattern RecognitionSignal ProcessingIncremental learningComputer visionComputer Vision and Pattern RecognitionArtificial intelligenceAdaptive learningbusinessMassively parallelSoftwarePattern Recognition Letters

researchProduct

Concept and Development of Modular VLIW Processor Based on FPGA

2010

Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance VLIW processor core in an FPGA. Architecture based on Very Long Instruction Word (VLIW) processors are an optimal choice in the attempt to obtain high performance level in embedded system. In VLIW architecture, the effectiveness of these processors depends on the ability of compilers to provide sufficient instruction level parallelism(ILP) in program code. Using advanced compiler technology could take these functions, This paper describes research resu…

Multi-core processorAssembly languagebusiness.industryComputer scienceHardware description languageModular designcomputer.software_genreComputer architectureVery long instruction wordVHDLCompilerInstruction-level parallelismbusinesscomputercomputer.programming_language2010 Second International Conference on Computer and Network Technology

researchProduct

Multiple modular very long instruction word processors based on field programmable gate arrays

2007

Modern field programmable gate array (FPGA) chips, with their large memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high-density FPGAs, it is now possible to implement a high-performance very long instruction word (VLIW) processor core in an FPGA. This paper describes research results about enabling the DSP TMS320 C6201 model for real-time image processing applications by exploiting FPGA technology. We present a modular DSP C6201 VHDL model with a variable instruction set. We call this new development a minimum mandatory modules (M3) approach. Our goals are to keep the flexibility of DSP in order to shor…

Multi-core processorComputer sciencebusiness.industryReconfigurabilityModular designAtomic and Molecular Physics and OpticsComputer Science ApplicationsInstruction setParallel processing (DSP implementation)Computer architectureVery long instruction wordEmbedded systemVHDLHardware_ARITHMETICANDLOGICSTRUCTURESElectrical and Electronic EngineeringField-programmable gate arraybusinesscomputercomputer.programming_languageJournal of Electronic Imaging

researchProduct

VLBI-resolution radio-map algorithms: Performance analysis of different levels of data-sharing on multi-socket, multi-core architectures

2012

a b s t r a c t A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments sh…

Multi-core processorMemory hierarchy010308 nuclear & particles physicsComputer scienceGeneral Physics and AstronomyParallel computing01 natural sciencesScheduling (computing)Data sharingComputer engineeringHardware and Architecture0103 physical sciencesVery-long-baseline interferometryScalabilityCache010303 astronomy & astrophysicsScalingComputer Physics Communications, CPC, 1937-1946 (2012)

researchProduct

Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multicore Processors

2010

Two strategies of distribution of computations can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have been researched thoroughly. However, the advent of multicores necessitates enhancements to them. In this paper, we present these enhancements. Our study is based on experiments using six applications to perform Parallel Matrix-matrix Multiplication (PMM) on an HCoM employing the two distribution strategies.

Multi-core processorParallel processing (DSP implementation)Computer scienceComputationLinear algebraParallel algorithmConcurrent computingMultiplicationParallel computingMatrix multiplication2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing

researchProduct

Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures

2013

The computing capabilities of current multi-core and many-core architectures have been used in crowd simulations for both enhancing crowd rendering and simulating continuum crowds. However, improving the scalability of crowd simulation systems by exploiting the inherent parallelism of these architectures is still an open issue. In this paper, we propose different parallelization strategies for the collision check procedure that takes place in agent-based simulations. These strategies are designed for exploiting the parallelism in both multi-core and many-core architectures like graphic processing units (GPUs). As for the many-core implementations, we analyse the bottlenecks of a previous G…

Multi-core processorSpeedupComputer scienceParallel computingCollisionTheoretical Computer ScienceRendering (computer graphics)CrowdsHardware and ArchitectureScalabilityCollision detectionCrowd simulationGeneral-purpose computing on graphics processing unitsSoftwareThe International Journal of High Performance Computing Applications

researchProduct

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

researchProduct

Flexible VLIW processor based on FPGA for real-time image processing

2011

Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the developm…

Multi-core processorbusiness.industryComputer scienceApplication-specific instruction-set processorReconfigurabilityInstruction setComputer architectureVery long instruction wordEmbedded systemVHDLbusinessInstruction-level parallelismcomputercomputer.programming_languageFPGA prototypeProceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)

researchProduct

Elementary transformation analysis for Array-OL

2009

Array-OL is a high-level specification language dedicated to the definition of multidimentional intensive signal processing applications. It allows to specify both the task parallelism and the data parallelism of these applications on focusing on their complex multidimensional data access patterns. Several tools exist for implementing an Array-OL specification as a data parallel program. While Array-OL can be used directly, it is often convenient to be able to deduce part of the specification from a sequential version of the application. This paper proposes such an analysis and examines its feasibility and its limits.

Multidimensional signal processingSignal processingProgram analysisTheoretical computer scienceParallel processing (DSP implementation)Data parallelismProgramming languageComputer scienceTask parallelismSpecification languageElementary transformationcomputer.software_genrecomputer2009 IEEE/ACS International Conference on Computer Systems and Applications

researchProduct