Search results for "Computational Science"
showing 10 items of 124 documents
Elastic plastic analysis iterative solution
1998
The step-by-step analysis of finite element elastic plastic structures subjected to an assigned (quasi-static) loading history, is considered; it identifies with the well-known sequence of linear complementarity problems. An iterative technique devoted to solve the relevant linear complementarity problem is presented. It is based on the recursive solution of a suitable linear complementarity problem, deduced from the relevant one and easier than it. The procedure convergency is proved. Some noticing particular cases are examined. The physical meaning of the procedure is shown to be a plastic relaxation. The suitable numerical ranges for some check parameter values, to be utilized in the app…
Many-body perturbation theory calculations using the yambo code
2019
Abstract yambo is an open source project aimed at studying excited state properties of condensed matter systems from first principles using many-body methods. As input, yambo requires ground state electronic structure data as computed by density functional theory codes such as Quantum ESPRESSO and Abinit. yambo’s capabilities include the calculation of linear response quantities (both independent-particle and including electron–hole interactions), quasi-particle corrections based on the GW formalism, optical absorption, and other spectroscopic quantities. Here we describe recent developments ranging from the inclusion of important but oft-neglected physical effects such as electron–phonon i…
CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing
2014
We present CUSHAW2-GPU to accelerate the CUSHAW2 algorithm using compute unified device architecture (CUDA)-enabled GPUs. Two critical GPU computing techniques, namely intertask hybrid CPU-GPU parallelism and tile-based Smith-Waterman map backtracking using CUDA, are investigated to facilitate fast alignments. By aligning both simulated and real reads to the human genome, our aligner yields comparable or better performance compared to BWA-SW, Bowtie2, and GEM. Furthermore, CUSHAW2-GPU with a Tesla K20c GPU achieves significant speedups over the multithreaded CUSHAW2, BWA-SW, Bowtie2, and GEM on the 12 cores of a high-end CPU for both single-end and paired-end alignment.
Q-Chem 2.0: a high-performanceab initio electronic structure program package
2000
ABSTRACT: Q-Chem 2.0 is a new release of an electronic structure programpackage, capable of performing first principles calculations on the ground andexcited states of molecules using both density functional theory and wavefunction-based methods. A review of the technical features contained withinQ-Chem 2.0 is presented. This article contains brief descriptive discussions of thekey physical features of all new algorithms and theoretical models, together withsample calculations that illustrate their performance. c 2000 John Wiley S electronic structure; density functional theory;computer program; computational chemistry Introduction A reader glancing casually at this article mightsuspect on t…
COMPARISON OF CPML IMPLEMENTATIONS FOR THE GPU-ACCELERATED FDTD SOLVER
2011
Three distinctively difierent implementations of convolu- tional perfectly matched layer for the FDTD method on CUDA enabled graphics processing units are presented. All implementations store ad- ditional variables only inside the convolutional perfectly matched lay- ers, and the computational speeds scale according to the thickness of these layers. The merits of the difierent approaches are discussed, and a comparison of computational performance is made using complex real-life benchmarks.
Exploring parallel capabilities of an innovative numerical method for recovering image velocity vectors field
2010
In this paper an efficient method devoted to estimate the velocity vectors field is investigated. The method is based on a quasi-interpolant operator and involves a large amount of computation. The operations characterizing the computational scheme are ideal for parallel processing because they are local, regular and repetitive. Therefore, the spatial parallelism of the process is studied to rapidly proceed in the computation on distributed multiprocessor systems. The process has shown to be synchronous, with good task balancing and requiring a small amount of data transfer.
An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface
2019
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations’ performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication.…
High Precision Conservative Surface Mesh Generation for Swept Volumes
2015
We present a novel, efficient, and flexible scheme to generate a high-quality mesh that approximates the outer boundary of a swept volume. Our approach comes with two guarantees. First, the approximation is conservative, i.e., the swept volume is enclosed by the generated mesh. Second, the one-sided Hausdorff distance of the generated mesh to the swept volume is upper bounded by a user defined tolerance. Exploiting this tolerance the algorithm generates a mesh that is adapted to the local complexity of the swept volume boundary, keeping the overall output complexity remarkably low. The algorithm is two-phased: the actual sweep and the mesh generation. In the sweeping phase, we introduce a g…
The integral‐direct coupled cluster singles and doubles model
1996
An efficient and highly vectorized implementation of the coupled cluster singles and doubles (CCSD) model using a direct atomic integral technique is presented. The minimal number of n6processes has been implemented for the most time consuming terms and point group symmetry is used to further reduce operation counts and memory requirements. The significantly increased application range of the CCSD method is illustrated with sample calculations on several systems with more than 500 basis functions. Furthermore, we present the basic trends of an open ended algorithm and discuss the use of integral prescreening. © 1996 American Institute of Physics.
Massively parallel computation of atmospheric neutrino oscillations on CUDA-enabled accelerators
2019
Abstract The computation of neutrino flavor transition amplitudes through inhomogeneous matter is a time-consuming step and thus could benefit from optimization and parallelization. Next to reliable parameter estimation of intrinsic physical quantities such as neutrino masses and mixing angles, these transition amplitudes are important in hypothesis testing of potential extensions of the standard model of elementary particle physics, such as additional neutrino flavors. Hence, fast yet precise implementations are of high importance to research. In the recent past, massively parallel accelerators such as CUDA-enabled GPUs featuring thousands of compute units have been widely adopted due to t…