Search results for "Supercomputer"

showing 10 items of 45 documents

Improving Collective I/O Performance Using Non-volatile Memory Devices

2016

Collective I/O is a parallel I/O technique designed to deliver high performance data access to scientific applications running on high-end computing clusters. In collective I/O, write performance is highly dependent upon the storage system response time and limited by the slowest writer. The storage system response time in conjunction with the need for global synchronisation, required during every round of data exchange and write, severely impacts collective I/O performance. Future Exascale systems will have an increasing number of processor cores, while the number of storage servers will remain relatively small. Therefore, the storage system concurrency level will further increase, worseni…

Input/outputFile system020203 distributed computingMulti-core processorbusiness.industryComputer scienceConcurrency020206 networking & telecommunications02 engineering and technologycomputer.software_genreSupercomputerNon-volatile memoryMemory managementData accessServerComputer data storage0202 electrical engineering electronic engineering information engineeringbusinesscomputerComputer network2016 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct

XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters

2019

Finding the longest common subsequence (LCS) of two strings is a classical problem in bioinformatics. A basic approach to solve this problem is based on dynamic programming. As the biological sequence databases are growing continuously, bit-parallel sequence comparison algorithms are becoming increasingly important. In this paper, we present XLCS, a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations. We have designed an asynchronous IO framework to improve the data transfer efficiency. To make full use of the computing resources of Xeon Phi clusters, we use three levels of parallelism: node-level, thread-level and vector-level.…

Longest common subsequence problemDynamic programmingSpeedupComputer scienceComputer clusterAsynchronous I/OCacheSupercomputerAlgorithmXeon Phi2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

researchProduct

A recurrence-free variant of strassen’s algorithm on hypercube

1995

In this paper a non-recursive Strassen’s matrix multiplication algorithm is presented. This new algorithm is suitable to run on parallel environments. Two computational schemes have been worked out exploiting different parallel approaches on hypercube architecture. A comparative analysis is reported. The experiments have been carried out on an nCUBE-2 supercomputer, housed at CNUCE in Pisa, supporting the Express parallel operating system. © 1995, Taylor & Francis Group, LLC. All rights reserved.

Matrix multiplicationGeneral Computer ScienceComputer scienceExpress operating systemComputer Science (all)Parallel computingStrassen’s algorithmSupercomputerMatrix multiplicationStrassen algorithmHypercube architectureHypercubeAlgorithmHypercube architecture

researchProduct

Millimeter-Scale and Billion-Atom Reactive Force Field Simulation on Sunway Taihulight

2020

Large-scale molecular dynamics (MD) simulations on supercomputers play an increasingly important role in many research areas. With the capability of simulating charge equilibration (QEq), bonds and so on, Reactive force field (ReaxFF) enables the precise simulation of chemical reactions. Compared to the first principle molecular dynamics (FPMD), ReaxFF has far lower requirements on computational resources so that it can achieve higher efficiencies for large-scale simulations. In this article, we present our efforts on scaling ReaxFF on the Sunway TaihuLight Supercomputer (TaihuLight). We have carefully redesigned the force analysis and neighbor list building steps. By applying fine-grained …

Molecular dynamicsComputational Theory and MathematicsHardware and ArchitectureComputer scienceComputationSignal ProcessingScalabilityInverse trigonometric functionsReaxFFSupercomputerForce field (chemistry)Sunway TaihuLightComputational scienceIEEE Transactions on Parallel and Distributed Systems

researchProduct

UPC++ for bioinformatics: A case study using genome-wide association studies

2014

Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is base…

Object-oriented programmingComputingMethodologies_PATTERNRECOGNITIONComputer scienceComputationSingle-coreGenome-wide association studyPartitioned global address spaceParallel computingBioinformaticsSupercomputer2014 IEEE International Conference on Cluster Computing (CLUSTER)

researchProduct

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

2014

This is a post-peer-review, pre-copyedit version of an article published in Lecture Notes in Computer Science. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-09873-9_57 [Abstract] High-throughput genotyping technologies allow the collection of up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as 2-SNP interactions, in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. In this work we present EpistSearch, a parallelized tool that, following the log-linear model appr…

POSIX ThreadsMulti-core processorBioinformaticsComputer scienceComputationCUDAParallel computingBioinformaticsPthreadsCUDAAccelerationComputingMethodologies_PATTERNRECOGNITIONTitan (supercomputer)Filter (video)EpistasisGWASEpistasis

researchProduct

Lattice quantum hadrodynamics on a CRAY Y-MP

1992

Quantum corrections to the mean-field equation of state for nuclear matter are estimated in a lattice simulation of quantum hadrodynamics on a CRAY Y-MP. In contrast with lattice quantum chromodynamics, where coordinate space methods are the standard, the calculations are carried out in momentum space and on nonhypercubic (irregular) lattices. The quantum corrections to the known, mean-field equation of state were found to be considerable. The time frame of the project and the large computational needs of the program required the use of powerful supercomputers, like the CRAY Y-MP, which are capable of performing at a very high computing speed by using both vector and parallel hardware, the …

Quantum chromodynamicsEquation of stateComputer scienceNumerical analysisMonte Carlo methodPosition and momentum spaceParallel computingNuclear matterSupercomputerTheoretical Computer ScienceComputational scienceHardware and ArchitectureQuantum hadrodynamicsLinear algebraCoordinate spaceQuantumSoftwareInformation SystemsThe Journal of Supercomputing

researchProduct

PROLISEAN: A New Security Protocol for Programmable Matter

2021

The vision for programmable matter is to create a material that can be reprogrammed to have different shapes and to change its physical properties on demand. They are autonomous systems composed of a huge number of independent connected elements called particles. The connections to one another form the overall shape of the system. These particles are capable of interacting with each other and take decisions based on their environment. Beyond sensing, processing, and communication capabilities, programmable matter includes actuation and motion capabilities. It could be deployed in different domains and will constitute an intelligent component of the IoT. A lot of applications can derive fro…

Self-reconfiguring modular robot0209 industrial biotechnologySecurity AlgorithmsComputer Networks and CommunicationsComputer scienceDistributed computingHash functionSecurity Protocol02 engineering and technology[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]EncryptionLightweight Cryptography[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR]020901 industrial engineering & automationComponent (UML)0202 electrical engineering electronic engineering information engineeringModular RobotsProgrammable MatterProtocol (object-oriented programming)IOTbusiness.industry020206 networking & telecommunicationsCryptographic protocolSupercomputer[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationProgrammable matter[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA][INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET]Amoebots[INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]businessDistributed Computing

researchProduct

Faster GPU-Accelerated Smith-Waterman Algorithm with Alignment Backtracking for Short DNA Sequences

2014

In this paper, we present a GPU-accelerated Smith-Waterman (SW) algorithm with Alignment Backtracking, called GSWAB, for short DNA sequences. This algorithm performs all-to-all pairwise alignments and retrieves optimal local alignments on CUDA-enabled GPUs. To facilitate fast alignment backtracking, we have investigated a tile-based SW implementation using the CUDA programming model. This tiled computing pattern enables us to more deeply explore the powerful compute capability of GPUs. We have evaluated the performance of GSWAB on a Kepler-based GeForce GTX Titan graphics card. The results show that GSWAB can achieve a performance of up to 56.8 GCUPS on large-scale datasets. Furthermore, ou…

Smith–Waterman algorithmCUDATitan (supercomputer)SpeedupComputer scienceBacktrackingParallel computingSoftware_PROGRAMMINGTECHNIQUESGraphicsDNA sequencingComputingMethodologies_COMPUTERGRAPHICS

researchProduct

Reconstruction of Low Energy Neutrino Events with GPUs at IceCube

2020

IceCube is a cubic kilometer neutrino observatory located at the South Pole that produces massive amounts of data by measuring individual Cherenkov photons from neutrino interaction events in the energy range from few GeV to several PeV. The actual reconstruction of neutrino events in the GeV range is computationally challenging due to the scarcity of data produced by single events. This can lead to run times of several weeks for the state-of-the-art reconstruction method – Pegleg – on CPUs for typical workloads of many ten-thousand events. We propose a GPU version of Pegleg that probes the likelihood space with several hypotheses in parallel while adapting the amount of parallel sampled hy…

Speedup010308 nuclear & particles physicsComputer scienceAstrophysics::High Energy Astrophysical PhenomenaComputation01 natural sciencesComputational scienceTitan (supercomputer)Observatory0103 physical sciencesRange (statistics)Neutrino010306 general physicsNeutrino oscillationCherenkov radiation

researchProduct