Search results for "Parallel"

showing 10 items of 667 documents

Improving computation efficiency using input and architecture features for a virtual screening application

2023

Virtual screening is an early stage of the drug discovery process that selects the most promising candidates. In the urgent computing scenario it is critical to find a solution in a short time frame. In this paper, we focus on a real-world virtual screening application to evaluate out-of-kernel optimizations, that consider input and architecture features to improve the computation efficiency on GPU. Experiment results on a modern supercomputer node show that we can almost double the performance. Moreover, we implemented the optimization using SYCL and it provides a consistent benefit with the CUDA optimization. A virtual screening campaign can use this gain in performance to increase the nu…

Computational Engineering Finance and Science (cs.CE)FOS: Computer and information sciencesComputer Science - Distributed Parallel and Cluster ComputingHardware Architecture (cs.AR)Distributed Parallel and Cluster Computing (cs.DC)Computer Science - Computational Engineering Finance and ScienceComputer Science - Hardware Architecture
researchProduct

On the systolic calculation of all-pairs interactions using transputer arrays

1991

Computational MathematicsNumerical AnalysisParallelism (rhetoric)Physics and Astronomy (miscellaneous)Computer scienceApplied MathematicsModeling and SimulationTransputerNumerical analysisParticle interactionMultiprocessingParallel computingComputer Science ApplicationsJournal of Computational Physics
researchProduct

Comparison of implementations of the lattice-Boltzmann method

2008

AbstractSimplicity of coding is usually an appealing feature of the lattice-Boltzmann method (LBM). Conventional implementations of LBM are often based on the two-lattice or the two-step algorithm, which however suffer from high memory consumption and poor computational performance, respectively. The aim of this work was to identify implementations of LBM that would achieve high computational performance with low memory consumption. Effects of memory addressing schemes were investigated in particular. Data layouts for velocity distribution values were also considered, and they were found to be related to computational performance. A novel bundle data layout was therefore introduced. Address…

Computational fluid mechanicsMemory addressing schemesComputer scienceLattice Boltzmann methodsParallel computingSupercomputerAddressing modeHigh memoryMemory addressComputational MathematicsComputational Theory and MathematicsModeling and SimulationBundleModelling and SimulationLattice-Boltzmann methodImplementationHigh-performance computingCoding (social sciences)Computers & Mathematics with Applications
researchProduct

Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance

2019

The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating impo…

Computer Networks and CommunicationsComputer scienceDistributed computingContext (language use)02 engineering and technologyParallel architectures0202 electrical engineering electronic engineering information engineeringParallel processingWirelessSignal processingMulti-core processorHeterogeneous (hybrid) systemsbusiness.industry020206 networking & telecommunicationsAcoustic source localizationWireless acoustic sensor networks (WASNs)Computer Science ApplicationsEnergy efficiencyHardware and ArchitectureSignal Processing020201 artificial intelligence & image processingElectrónicabusinessWireless sensor networkSource localizationInformation SystemsEfficient energy useAcoustic signal processing
researchProduct

SAUCE: A web application for interactive teaching and learning of parallel programming

2017

Abstract Prevalent hardware trends towards parallel architectures and algorithms create a growing demand for graduate students familiar with the programming of concurrent software. However, learning parallel programming is challenging due to complex communication and memory access patterns as well as the avoidance of common pitfalls such as dead-locks and race conditions. Hence, the learning process has to be supported by adequate software solutions in order to enable future computer scientists and engineers to write robust and efficient code. This paper discusses a selection of well-known parallel algorithms based on C++11 threads, OpenMP, MPI, and CUDA that can be interactively embedded i…

Computer Networks and Communicationsbusiness.industryComputer scienceProgramming languageWhite-box testingParallel algorithmProcess (computing)020206 networking & telecommunications02 engineering and technologyParallel computingThread (computing)computer.software_genreTheoretical Computer ScienceCUDASoftwareArtificial IntelligenceHardware and Architecture0202 electrical engineering electronic engineering information engineeringCode (cryptography)Web application020201 artificial intelligence & image processingbusinesscomputerSoftwareJournal of Parallel and Distributed Computing
researchProduct

Domain-Knowledge Optimized Simulated Annealing for Network-on-Chip Application Mapping

2013

Network-on-Chip architectures are scalable on-chip interconnection networks. They replace the inefficient shared buses and are suitable for multicore and manycore systems. This paper presents an Optimized Simulated Annealing (OSA) algorithm for the Network-on-Chip application mapping problem. With OSA, the cores are implicitly and dynamically clustered using knowledge about communication demands. We show that OSA is a more feasible Simulated Annealing approach to NoC application mapping by comparing it with a general Simulated Annealing algorithm and a Branch and Bound algorithm, too. Using real applications we show that OSA is significantly faster than a general Simulated Annealing, withou…

Computer Science::Hardware ArchitectureInterconnectionMulti-core processorNetwork on a chipBranch and boundComputer scienceScalabilitySimulated annealingComputer Science::Networking and Internet ArchitectureParallel computingAdaptive simulated annealingCluster analysis
researchProduct

"Table 24" of "Measurement of event shape and inclusive distributions at s**(1/2) = 130-GeV and 136-GeV."

1997

3-jet rate for the Jade Algorithm.

Computer Science::Multiagent Systems133.0E+ E- --> 3JETAstrophysics::High Energy Astrophysical PhenomenaE+ E- ScatteringIntegrated Cross SectionExclusiveHigh Energy Physics::ExperimentJet ProductionCross SectionSIGComputer Science::Distributed Parallel and Cluster Computing
researchProduct

"Table 23" of "Measurement of event shape and inclusive distributions at s**(1/2) = 130-GeV and 136-GeV."

1997

2-jet rate for the Jade Algorithm.

Computer Science::Multiagent SystemsDijet Production133.0Astrophysics::High Energy Astrophysical PhenomenaE+ E- ScatteringIntegrated Cross SectionExclusiveHigh Energy Physics::ExperimentJet ProductionE+ E- --> 2JETCross SectionSIGComputer Science::Distributed Parallel and Cluster Computing
researchProduct

"Table 25" of "Measurement of event shape and inclusive distributions at s**(1/2) = 130-GeV and 136-GeV."

1997

4-jet rate for the Jade Algorithm.

Computer Science::Multiagent SystemsE+ E- --> 4JET133.0Astrophysics::High Energy Astrophysical PhenomenaE+ E- ScatteringIntegrated Cross SectionExclusiveHigh Energy Physics::ExperimentJet ProductionCross SectionSIGComputer Science::Distributed Parallel and Cluster Computing
researchProduct

"Table 26" of "Measurement of event shape and inclusive distributions at s**(1/2) = 130-GeV and 136-GeV."

1997

5-jet rate for the Jade Algorithm.

Computer Science::Multiagent SystemsE+ E- --> 5JET133.0Astrophysics::High Energy Astrophysical PhenomenaE+ E- ScatteringIntegrated Cross SectionExclusiveHigh Energy Physics::ExperimentJet ProductionCross SectionSIGComputer Science::Distributed Parallel and Cluster Computing
researchProduct