Search results for "Shared memory"

showing 10 items of 26 documents

A Low Cost Solution for 2D Memory Access

2006

Many of the new coding tools in the H.264/AVC video coding standard are based on 2D processing resulting in row-wise and column-wise memory accesses starting from arbitrary memory locations. This paper proposes a low cost solution for efficient realization of these 2D block memory accesses on sub-word parallel processors. It is based on the use of simple register-based data permutation networks placed between the processor and memory. The data rearrangement capabilities of the networks can further be extended with more complex control schemes. With the proposed control schemes, the networks enable row and column accesses from arbitrary memory locations for blocks of data while maintaining f…

Flat memory modelShared memoryComputer scienceInterleaved memoryRegistered memoryUniform memory accessSemiconductor memoryDistributed memoryParallel computingMemory map2006 49th IEEE International Midwest Symposium on Circuits and Systems

researchProduct

Use of parallel computing to improve the accuracy of calculated molecular properties

1998

Calculation of electron correlation energy in molecules is unavoidable in accurate studies of chemical reactivity. However, these calculations involve, a computational effort several, even in the simplest cases, orders of magnitude larger than the computer power nowadays available. In this work the possibility of parallelize the calculations of the electron correlation energy is studied. The formalism chosen is the dressing of matrices in both distributed and shared memory parallel systems MIMD. Algorithms developed on PVM are presented, and the results are evaluated on several platforms. These results show that the parallel techniques are useful in order to decrease very appreciably the ti…

Formalism (philosophy of mathematics)Matrix (mathematics)MIMDShared memoryElectronic correlationComputer scienceParallel computing

researchProduct

Parallel Algorithms for Listing Well-Formed Parentheses Strings

1998

We present two cost-optimal parallel algorithms generating the set of all well-formed parentheses strings of length 2n with constant delay for each generated string. In our first algorithm we generate in lexicographic order well-formed parentheses strings represented by bitstrings, and in the second one we use the representation by weight sequences. In both cases the computational model is based on an architecture CREW PRAM, where each processor performs the same algorithm simultaneously on a different set of data. Different processors can access the shared memory at the same time to read different data in the same or different memory locations, but no two processors are allowed to write i…

Gray codeSet (abstract data type)Shared memoryHardware and ArchitectureComputer scienceString (computer science)Parallel algorithmParallel random-access machineLexicographical orderTime complexityAlgorithmSoftwareTheoretical Computer ScienceParallel Processing Letters

researchProduct

SYSTOLIC GENERATION OF k-ARY TREES

1999

The only parallel generating algorithms for k-ary trees are those of Akl and Stojmenović in 1996 and of Vajnovszki and Phillips in 1997. In the first of them, trees are represented by an inversion table and the processor model is a linear aray multicomputer. In the second, trees are represented by bitstrings and the algorithm executes on a shared memory multiprocessor. In this paper we give a parallel generating algorithm for k-ary trees represented by generalized P–sequences for execution on a linear array multicomputer.

Hardware and ArchitectureShared memory multiprocessorProcessor modelWeight-balanced treeParallel algorithmParallel computingInversion tableSoftwareTheoretical Computer ScienceLinear arrayMathematicsVector processorParallel Processing Letters

researchProduct

Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors

2014

In this paper we analyze the energy overhead incurred when operating with data stored in different levels of the memory subsystem (cache levels and DDR chips) of current multicore architectures. Our approach builds upon servet, a portable framework for the memory characterization of multicore processors, extending this suite with a power-related test that, when applied to a platform equipped with a power measurement mechanism, provides information on the efficiency of memory energy usage. As additional contributions, i) we provide a complete experimental study of the impact that the CPU performance states (also known as P-states) exert on the memory energy efficiency of a collection of rece…

Memory coherenceMemory managementFlat memory modelShared memoryComputer scienceInterleaved memoryUniform memory accessDistributed memorySemiconductor memoryParallel computing2014 IEEE International Symposium on Parallel and Distributed Processing with Applications

researchProduct

Simurgh

2021

The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updat…

MetadataFile systemConsistency (database systems)Shared memoryComputer scienceScalabilityMetadata managementData_FILESUser spaceOperating systemLinux kernelcomputer.software_genrecomputerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

researchProduct

Suffix Array Construction on Multi-GPU Systems

2019

Suffix arrays are prevalent data structures being fundamental to a wide range of applications including bioinformatics, data compression, and information retrieval. Therefore, various algorithms for (parallel) suffix array construction both on CPUs and GPUs have been proposed over the years. Although providing significant speedup over their CPU-based counterparts, existing GPU implementations share a common disadvantage: input text sizes are limited by the scarce memory of a single GPU. In this paper, we overcome aforementioned memory limitations by exploiting multi-GPU nodes featuring fast NVLink interconnects. In order to achieve high performance for this communication-intensive task, we …

Multi-core processorSpeedupComputer scienceSuffix array0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural scienceslaw.inventionCUDAShared memory010201 computation theory & mathematicslaw0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingSuffixData compressionProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing

researchProduct

Solution of time-independent Schrödinger equation by the imaginary time propagation method

2007

Numerical solution of eigenvalues and eigenvectors of large matrices originating from discretization of linear and non-linear Schrodinger equations using the imaginary time propagation (ITP) method is described. Convergence properties and accuracy of 2nd and 4th order operator-splitting methods for the ITP method are studied using numerical examples. The natural convergence of the method is further accelerated with a new dynamic time step adjustment method. The results show that the ITP method has better scaling with respect to matrix size as compared to the implicitly restarted Lanczos method. An efficient parallel implementation of the ITP method for shared memory computers is also demons…

Numerical AnalysisPhysics and Astronomy (miscellaneous)DiscretizationApplied MathematicsMathematical analysisMathematicsofComputing_NUMERICALANALYSISOrder (ring theory)Computer Science::Human-Computer InteractionComputer Science ApplicationsSchrödinger equationComputational Mathematicssymbols.namesakeLanczos resamplingShared memoryModeling and SimulationConvergence (routing)symbolsScalingEigenvalues and eigenvectorsMathematicsJournal of Computational Physics

researchProduct

Unified Parallel C++

2018

Abstract Although MPI is commonly used for parallel programming on distributed-memory systems, Partitioned Global Address Space (PGAS) approaches are gaining attention for programming modern multi-core CPU clusters. They feature a hybrid memory abstraction: distributed memory is viewed as a shared memory that is partitioned among nodes in order to simplify programming. In this chapter you will learn about Unified Parallel C++ (UPC++), a library-based extension of C++ that gathers the advantages of both PGAS and Object Oriented paradigms. The examples included in this chapter will help you to understand the main features of PGAS languages and how they can simplify the task of programming par…

Object-oriented programmingSource codeComputer sciencemedia_common.quotation_subjectParallel computingSoftware_PROGRAMMINGTECHNIQUESShared memoryAsynchronous communicationUnified Parallel CDistributed memoryPartitioned global address spacecomputercomputer.programming_languageAbstraction (linguistics)media_common

researchProduct

Optimized Parallel Implementation of Face Detection based on GPU component

2015

Display Omitted An algorithm for face detection has been implemented on CPU.An acceleration of this algorithm on GPU migration.Performance of GPU implementation shows the effectiveness of this implementation.Another optimization method on GPU are operated. Face detection is an important aspect for various domains such as: biometrics, video surveillance and human computer interaction. Generally a generic face processing system includes a face detection, or recognition step, as well as tracking and rendering phase. In this paper, we develop a real-time and robust face detection implementation based on GPU component. Face detection is performed by adapting the Viola and Jones algorithm. We hav…

Parallel computingBiometricsComputer Networks and CommunicationsComputer science02 engineering and technologyParallel computing[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processingFace detectionRendering (computer graphics)CUDACUDA optimizationArtificial Intelligence0202 electrical engineering electronic engineering information engineeringGraphics processorsAdaBoost[ INFO.INFO-ES ] Computer Science [cs]/Embedded SystemsGraphicsWaldBoostFace detectionComputingMilieux_MISCELLANEOUS[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processingViola and Jones algorithmAdaBoostGrid020202 computer hardware & architectureShared memoryHardware and Architecture020201 artificial intelligence & image processing[INFO.INFO-ES]Computer Science [cs]/Embedded Systems[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processingSoftware

researchProduct