Search results for "DUP"
showing 10 items of 499 documents
Sorafenib maintenance after allogeneic hematopoietic stem cell transplantation for acute myeloid leukemia with FLT3-internal tandem duplication mutat…
2020
PURPOSE Despite undergoing allogeneic hematopoietic stem cell transplantation (HCT), patients with acute myeloid leukemia (AML) with internal tandem duplication mutation in the FMS-like tyrosine kinase 3 gene ( FLT3-ITD) have a poor prognosis, frequently relapse, and die as a result of AML. It is currently unknown whether a maintenance therapy using FLT3 inhibitors, such as the multitargeted tyrosine kinase inhibitor sorafenib, improves outcome after HCT. PATIENTS AND METHODS In a randomized, placebo-controlled, double-blind phase II trial (SORMAIN; German Clinical Trials Register: DRKS00000591), 83 adult patients with FLT3-ITD–positive AML in complete hematologic remission after HCT were r…
Reconstruction of Low Energy Neutrino Events with GPUs at IceCube
2020
IceCube is a cubic kilometer neutrino observatory located at the South Pole that produces massive amounts of data by measuring individual Cherenkov photons from neutrino interaction events in the energy range from few GeV to several PeV. The actual reconstruction of neutrino events in the GeV range is computationally challenging due to the scarcity of data produced by single events. This can lead to run times of several weeks for the state-of-the-art reconstruction method – Pegleg – on CPUs for typical workloads of many ten-thousand events. We propose a GPU version of Pegleg that probes the likelihood space with several hypotheses in parallel while adapting the amount of parallel sampled hy…
The Dynamical Kernel Scheduler - Part 1
2015
Emerging processor architectures such as GPUs and Intel MICs provide a huge performance potential for high performance computing. However developing software using these hardware accelerators introduces additional challenges for the developer such as exposing additional parallelism, dealing with different hardware designs and using multiple development frameworks in order to use devices from different vendors. The Dynamic Kernel Scheduler (DKS) is being developed in order to provide a software layer between host application and different hardware accelerators. DKS handles the communication between the host and device, schedules task execution, and provides a library of built-in algorithms. …
Optimization of Reactive Force Field Simulation: Refactor, Parallelization, and Vectorization for Interactions
2022
Molecular dynamics (MD) simulations are playing an increasingly important role in many areas ranging from chemical materials to biological molecules. With the continuing development of MD models, the potentials are getting larger and more complex. In this article, we focus on the reactive force field (ReaxFF) potential from LAMMPS to optimize the computation of interactions. We present our efforts on refactoring for neighbor list building, bond order computation, as well as valence angles and torsion angles computation. After redesigning these kernels, we develop a vectorized implementation for non-bonded interactions, which is nearly $100 \times$ 100 × faster than the management processing…
Reducing complexity in H.264/AVC motion estimation by using a GPU
2011
H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…
CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations
2013
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…
Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations
2012
Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in …
CliffoSor: A Parallel Embedded Architecture for Geometric Algebra and Computer Graphics
2006
Geometric object representation and their transformations are the two key aspects in computer graphics applications. Traditionally, compute-intensive matrix calculations are involved to model and render 3D scenery. Geometric algebra (a.k.a. Clifford algebra) is gaining growing attention for its natural way to model geometric facts coupled with its being a powerful analytical tool for symbolic calculations. In this paper, the architecture of CliffoSor (Clifford Processor) is introduced. ClifforSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on an FPGA board is detailed. Initial test results show more …
Circuits and excitations to enable Brownian token-based computing with skyrmions
2021
Brownian computing exploits thermal motion of discrete signal carriers (tokens) for computations. In this paper we address two major challenges that hinder competitive realizations of circuits and application of Brownian token-based computing in actual devices for instance based on magnetic skyrmions. To overcome the problem that crossings generate for the fabrication of circuits, we design a crossing-free layout for a composite half-adder module. This layout greatly simplifies experimental implementations as wire crossings are effectively avoided. Additionally, our design is shorter to speed up computations compared to conventional designs. To address the key issue of slow computation base…
First Experiences on an Accurate SPH Method on GPUs
2017
It is well known that the standard formulation of the Smoothed Particle Hydrodynamics is usually poor when scattered data distribution is considered or when the approximation near the boundary occurs. Moreover, the method is computational demanding when a high number of data sites and evaluation points are employed. In this paper an enhanced version of the method is proposed improving the accuracy and the efficiency by using a HPC environment. Our implementation exploits the processing power of GPUs for the basic computational kernel resolution. The performance gain demonstrates the method to be accurate and suitable to deal with large sets of data.