Search results for "2020"

showing 10 items of 4977 documents

Neighbor-list-free molecular dynamics on sunway TaihuLight supercomputer

2020

Molecular dynamics (MD) simulations are playing an increasingly important role in many research areas. Pair-wise potentials are widely used in MD simulations of bio-molecules, polymers, and nano-scale materials. Due to a low compute-to-memory-access ratio, their calculation is often bounded by memory transfer speeds. Sunway TaihuLight is one of the fastest supercomputers featuring a custom SW26010 many-core processor. Since the SW26010 has some critical limitations regarding main memory bandwidth and scratchpad memory size, it is considered as a good platform to investigate the optimization of pair-wise potentials especially in terms of data reusage. MD algorithms often use a neighbor-list …

020203 distributed computingComputer science020207 software engineeringMemory bandwidth02 engineering and technologyParallel computingSW26010Data structureSupercomputerVectorization (mathematics)0202 electrical engineering electronic engineering information engineeringNode (circuits)Sunway TaihuLightScratchpad memoryProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
researchProduct

Multi-application Based Network-on-Chip Design for Mesh-of-Tree Topology Using Global Mapping and Reconfigurable Architecture

2019

This paper outlines a multi-application mapping for Mesh-of-Tree (MoT) topology based Network-on-Chip (NoC) design using reconfigurable architecture. A two phase Particle Swarm Optimization (PSO) has been proposed for reconfigurable architecture to minimize the communication cost. In first phase global mapping is done by combining multiple applications and in second phase, reconfiguration is achieved by switching the cores to near by routers using multiplexers. Experimentations have been carried out for several application benchmarks and synthetic applications generated using TGFF tool. The results show significant improvement in terms of communication cost after reconfiguration.

020203 distributed computingComputer scienceControl reconfigurationParticle swarm optimizationTopology (electrical circuits)02 engineering and technologyNetwork topologyMultiplexingMultiplexer020202 computer hardware & architectureNetwork on a chipComputer architecture0202 electrical engineering electronic engineering information engineeringArchitecture2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID)
researchProduct

MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays

2016

Multi-agent systems allow the modelling of complex, heterogeneous, and distributed systems in a realistic way. MARL-Ped is a multi-agent system tool, based on the MPI standard, for the simulation of different scenarios of pedestrians who autonomously learn the best behavior by Reinforcement Learning. MARL-Ped uses one MPI process for each agent by design, with a fixed fine-grain granularity. This requirement limits the performance of the simulations for a restricted number of processors that is lesser than the number of agents. On the other hand, Hitmap is a library to ease the programming of parallel applications based on distributed arrays. It includes abstractions for the automatic parti…

020203 distributed computingComputer scienceDistributed computingMessage passing0202 electrical engineering electronic engineering information engineeringProcess (computing)Reinforcement learning020207 software engineering02 engineering and technologyCrowd simulationGranularityPartition (database)
researchProduct

Fault-Tolerant Network-on-Chip Design for Mesh-of-Tree Topology Using Particle Swarm Optimization

2018

As the size of the chip is scaling down the density of Intellectual Property (IP) cores integrated on a chip has been increased rapidly. The communication between these IP cores on a chip is highly challenging. To overcome this issue, Network-on-Chip (NoC) has been proposed to provide an efficient and a scalable communication architecture. In the deep sub-micron level NoCs are prone to faults which can occur in any component of NoC. To build a reliable and robust systems, it is necessary to apply efficient fault-tolerant techniques. In this paper, we present a flexible spare core placement in Mesh-of-Tree (MoT) topology using Particle Swarm Optimization (PSO) by considering IP core failures…

020203 distributed computingComputer scienceDistributed computingParticle swarm optimizationTopology (electrical circuits)Fault toleranceHardware_PERFORMANCEANDRELIABILITY02 engineering and technologyNetwork topologyChip020204 information systemsScalabilityHardware_INTEGRATEDCIRCUITS0202 electrical engineering electronic engineering information engineeringBenchmark (computing)Overhead (computing)TENCON 2018 - 2018 IEEE Region 10 Conference
researchProduct

Nvidia CUDA parallel processing of large FDTD meshes in a desktop computer

2020

The Finite Difference in Time Domain numerical (FDTD) method is a well know and mature technique in computational electrodynamics. Usually FDTD is used in the analysis of electromagnetic structures, and antennas. However still there is a high computational burden, which is a limitation for use in combination with optimization algorithms. The parallelization of FDTD to calculate in GPU is possible using Matlab and CUDA tools. For instance, the simulation of a planar array, with a three dimensional FDTD mesh 790x276x588, for 6200 time steps, takes one day -elapsed time- using the CPU of an Intel Core i3 at 2.4GHz in a personal computer, 8Gb RAM. This time is reduced 120 times when the calcula…

020203 distributed computingComputer scienceFinite-difference time-domain methodGraphics processing unit02 engineering and technologyComputational scienceCUDAPersonal computer0202 electrical engineering electronic engineering information engineeringComputational electromagnetics020201 artificial intelligence & image processingCentral processing unitTime domainMATLABcomputercomputer.programming_languageProceedings of the 10th Euro-American Conference on Telematics and Information Systems
researchProduct

WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

2018

Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of …

020203 distributed computingComputer scienceHash function0102 computer and information sciences02 engineering and technologyParallel computingData structure01 natural sciencesHash tableElectronic mailMemory management010201 computation theory & mathematicsScalability0202 electrical engineering electronic engineering information engineeringMassively parallelTime complexity2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
researchProduct

Torus Topology based Fault-Tolerant Network-on-Chip Design with Flexible Spare Core Placement

2018

The increase in the density of the IP cores being fabricated on a chip poses on-chip communication challenges and heat dissipation. To overcome these issues, Network-onChip (NoC) based communication architecture is introduced. In the nanoscale era NoCs are prone to faults which results in performance degradation and un-reliability. Hence efficient fault-tolerant methods are required to make the system reliable in contrast to diverse component failures. This paper presents a flexible spare core placement in torus topology based faulttolerant NoC design. The communications related to the failed core is taken care by selecting the best position for a spare core in the torus network. By conside…

020203 distributed computingComputer scienceParticle swarm optimizationFault toleranceTopology (electrical circuits)Hardware_PERFORMANCEANDRELIABILITY02 engineering and technologyChipTopology020202 computer hardware & architectureReduction (complexity)Network on a chipSpare part0202 electrical engineering electronic engineering information engineeringMetaheuristic
researchProduct

Massively Parallel Huffman Decoding on GPUs

2018

Data compression is a fundamental building block in a wide range of applications. Besides its intended purpose to save valuable storage on hard disks, compression can be utilized to increase the effective bandwidth to attached storage as realized by state-of-the-art file systems. In the foreseeing future, on-the-fly compression and decompression will gain utmost importance for the processing of data-intensive applications such as streamed Deep Learning tasks or Next Generation Sequencing pipelines, which establishes the need for fast parallel implementations. Huffman coding is an integral part of a number of compression methods. However, efficient parallel implementation of Huffman decompre…

020203 distributed computingComputer sciencebusiness.industryDeep learning020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingHuffman codingsymbols.namesakeCUDATitan (supercomputer)0202 electrical engineering electronic engineering information engineeringsymbolsArtificial intelligencebusinessMassively parallelData compressionProceedings of the 47th International Conference on Parallel Processing
researchProduct

Wireless NoC for Inter-FPGA Communication: Theoretical Case for Future Datacenters

2020

Integration of FPGAs in datacenters might have different motivations from acceleration to energy efficiency, but the goal of better performance tops all. FPGAs are being utilized in a variety of ways today, tightly coupled with heterogenous computing resources, and as a standalone network of homogenous resources. Open source software stacks, propriety tool chain, and programming languages with advanced methodologies are hitting hard on the programmability wall of the FPGAs. The deployment of FPGAs in datacenters will neither be sustainable nor economical, without realizing the multi-tenancy in multiple FPGAs. Inter-FPGA communication among multiple FPGAs remained relatively less addressed p…

020203 distributed computingComputer sciencebusiness.industryWireless networkDistributed computingCloud computing02 engineering and technologyVirtualizationcomputer.software_genreBottleneck020202 computer hardware & architectureSoftware deployment0202 electrical engineering electronic engineering information engineeringWireless[INFO]Computer Science [cs]businessField-programmable gate arraycomputerComputingMilieux_MISCELLANEOUSEfficient energy use2020 IEEE 23rd International Multitopic Conference (INMIC)
researchProduct

Moderated Redactable Blockchains: A Definitional Framework with an Efficient Construct

2020

Blockchain is a multiparty protocol to reach agreement on the order of events, and to record them consistently and immutably without centralized trust. In some cases, however, the blockchain can benefit from some controlled mutability. Examples include removing private information or unlawful content, and correcting protocol vulnerabilities which would otherwise require a hard fork. Two approaches to control the mutability are: moderation, where one or more designated administrators can use their private keys to approve a redaction, and voting, where miners can vote to endorse a suggested redaction. In this paper, we first present several attacks against existing redactable blockchain solut…

020203 distributed computingComputer sciencemedia_common.quotation_subject02 engineering and technologyConstruct (python library)RedactionComputer securitycomputer.software_genreDigital signatureOrder (exchange)020204 information systemsVoting0202 electrical engineering electronic engineering information engineeringFork (file system)Protocol (object-oriented programming)computerPrivate information retrievalmedia_common
researchProduct