Search results for "multiprocessor"
showing 10 items of 18 documents
The Acts project: track reconstruction software for HL-LHC and beyond
2019
The reconstruction of trajectories of the charged particles in the tracking detectors of high energy physics experiments is one of the most difficult and complex tasks of event reconstruction at particle colliders. As pattern recognition algorithms exhibit combinatorial scaling to high track multiplicities, they become the largest contributor to the CPU consumption within event reconstruction, particularly at current and future hadron colliders such as the LHC, HL-LHC and FCC-hh. Current algorithms provide an extremely high standard of physics and computing performance and have been tested on billions of simulated and recorded data events. However, most algorithms were first written 20 year…
Heterogeneous PBLAS: Optimization of PBLAS for Heterogeneous Computational Clusters
2008
This paper presents a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for heterogeneous computational clusters. We present the user interface and the software hierarchy of the first research implementation of HeteroPBLAS. This is the first step towards the development of a parallel linear algebra package for heterogeneous computational clusters. We demonstrate the efficiency of the HeteroPBLAS programs on a homogeneous computing cluster and a heterogeneous computing cluster.
Wireless versus Wired Network-on-Chip to Enable the Multi- Tenant Multi-FPGAs in Cloud
2021
The new era of computing is not CPU-centric but enriched with all the heterogeneous computing resources including the reconfigurable fabric. In multi-FPGA architecture, either deployed within a data center or as a standalone model, inter-FPGA communication is crucial. Network-on-chip exhibits a promising performance for the integration of one FPGA. A sustainable communication architecture requires stable performance as the number of applications or users grows. Wireless network-on-chip has the potential to be that communication architecture, as it boasts the same performance capability as wired solutions in addition to its multicast capacities. We conducted an exploratory study to investiga…
Online Scheduling of Task Graphs on Hybrid Platforms
2018
Modern computing platforms commonly include accelerators. We target the problem of scheduling applications modeled as task graphs on hybrid platforms made of two types of resources, such as CPUs and GPUs. We consider that task graphs are uncovered dynamically, and that the scheduler has information only on the available tasks, i.e., tasks whose predecessors have all been completed. Each task can be processed by either a CPU or a GPU, and the corresponding processing times are known. Our study extends a previous \(4\sqrt{m/k}\)-competitive online algorithm [2], where m is the number of CPUs and k the number of GPUs (\(m\ge k\)). We prove that no online algorithm can have a competitive ratio …
The Mu3e Data Acquisition
2020
The Mu3e experiment aims to find or exclude the lepton flavour violating decay $\mu^+\to e^+e^-e^+$ with a sensitivity of one in 10$^{16}$ muon decays. The first phase of the experiment is currently under construction at the Paul Scherrer Institute (PSI, Switzerland), where beams with up to 10$^8$ muons per second are available. The detector will consist of an ultra-thin pixel tracker made from High-Voltage Monolithic Active Pixel Sensors (HV-MAPS), complemented by scintillating tiles and fibres for precise timing measurements. The experiment produces about 100 Gbit/s of zero-suppressed data which are transported to a filter farm using a network of FPGAs and fast optical links. On the filte…
A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC
2012
H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been many solutions that deal with video codecs using heterogeneous systems. This paper proposes an algorithm to perform H.264/AVC inter prediction. The proposed algorithm performs the motion estimation, both with full-pixel and sub-pixel accuracy, using CUDA to assist the CPU, obtaining remarkable time …
Real-time data processing in the ALICE High Level Trigger at the LHC
2019
At the Large Hadron Collider at CERN in Geneva, Switzerland, atomic nuclei are collided at ultra-relativistic energies. Many final-state particles are produced in each collision and their properties are measured by the ALICE detector. The detector signals induced by the produced particles are digitized leading to data rates that are in excess of 48 GB/$s$. The ALICE High Level Trigger (HLT) system pioneered the use of FPGA- and GPU-based algorithms to reconstruct charged-particle trajectories and reduce the data size in real time. The results of the reconstruction of the collision events, available online, are used for high level data quality and detector-performance monitoring and real-tim…
Community-driven computational biology with Debian Linux
2011
Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. Results The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlyin…
Scalable Dense Factorizations for Heterogeneous Computational Clusters
2008
This paper discusses the design and the implementation of the LU factorization routines included in the Heterogeneous ScaLAPACK library, which is built on top of ScaLAPACK. These routines are used in the factorization and solution of a dense system of linear equations. They are implemented using optimized PBLAS, BLACS and BLAS libraries for heterogeneous computational clusters. We present the details of the implementation as well as performance results on a heterogeneous computing cluster.
Two Job Cyclic Scheduling with Incompatibility Constraints
2001
The present paper deals with the problem of scheduling several repeated occurrences of two jobs over a finite or infinite time horizon in order to maximize the yielded profit. The constraints of the problem are the incompatibilities between some pairs of tasks which require a same resource.