Search results for "Parallel"
showing 10 items of 667 documents
Tacitus on Titus? Visit to the Temple of Venus at Paphos
2020
This article deals with Titus? visit to the temple of Venus at Paphos in the second book of Tacitus? Historiae. I argue that apart from its other literary intentions already mentioned by scholars, this digression implicitly connects Titus not only with Aeneas but also with Julius Caesar. Titus? affair with Berenice that recalls Caesar?s affair with Cleopatra, Tacitus? allusions to Lucan?s De Bello Civili where Caesar?s visit to the tomb of Alexander the Great is described, the ?????Motiv and fortuna?s favour that characterise both Roman generals, all contribute to connect Titus with Caesar and allow the reader to view a parallel between the Flavian and the Julio-Claudian dynasty. Furthermor…
Empirical Autotuning of Two-level Parallel Linear Algebra Routines on Large cc-NUMA Systems
2012
In large cc-NUMA systems the efficient use of the different levels of the memory hierarchy is not an easy task, and the performance of multithreading implementations of the libraries decreases when the number of cores used increases, so producing an important lost of efficiency. To alleviate this problem, routines with multilevel parallelism can be developed by combining OpenMP and BLAS parallelism. In that way, higher performance can be achieved, but it is necessary to develop some autotuning technique for the appropriate selection of the number of threads to use at each level. The selection can be made through theoretical models of the execution time or some installation methodology. This…
3D high definition video coding on a GPU-based heterogeneous system
2013
H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the…
A Methodology for Bilingual Lexicon Extraction from Comparable Corpora
2015
Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new approach which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between wor…
Parallel Calculation of CCSDT and Mk-MRCCSDT Energies.
2010
A scheme for the parallel calculation of energies at the coupled-cluster singles, doubles, and triples (CCSDT) level of theory, several approximate iterative CCSDT schemes (CCSDT-1a, CCSDT-1b, CCSDT-2, CCSDT-3, and CC3), and for the state-specific multireference coupled-cluster ansatz suggested by Mukherjee with a full treatment of triple excitations (Mk-MRCCSDT) is presented. The proposed scheme is based on the adaptation of a highly efficient serial coupled-cluster code leading to a communication-minimized implementation by parallelizing the time-determining steps. The parallel algorithm is tailored for affordable cluster architectures connected by standard communication networks such as …
Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform
2012
Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time com…
Work Partitioning on Parallel and Distributed Agent-Based Simulation
2017
Work partitioning is a key challenge with ap- plications in many scientific and technological fields. The problem is very well studied with a rich literature on both distributed and parallel computing architectures. In this paper we deal with the work partitioning problem for parallel and distributed agent-based simulations which aims at (i) balancing the overall load distribution, (ii) minimizing, at the same time, the communication overhead due to agents' inter-dependencies. We introduce a classification taxonomy of work partitioning strategies and present a space-based work partitioning ap- proach, based on a Quad-tree data structure, which enables to: identify a good space partitioning …
Gl-learning
2016
In this paper, we present a new open-source software library, Gl-learning, for grammatical inference. The rise of new application scenarios in recent years has required optimized methods to address knowledge extraction from huge amounts of data and to model highly complex systems. Our library implements the main state-of-the-art algorithms in the grammatical inference field (RPNI, EDSM, L*), redesigned through the OpenMP library for a parallel execution that drastically decreases execution times. To our best knowledge, it is also the first comprehensive library including a noise tolerance learning algorithm, such as Blue*, that significantly broadens the range of the potential application s…
Text Compression Using Antidictionaries
1999
International audience; We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.
Parallel Collision Queries on the GPU
2013
We present parallel algorithms to accelerate collision tests of rigid body objects for a high number of independent transformations as they occur in sampling-based motion planning and path validation problems. We compare various GPU approaches with a different level of parallelism against each other and against a parallel CPU implementation. Our algorithms require no sophisticated load balancing schemes. They make no assumption on the distribution of the input transformations and require no pre-processing. Yet, we can perform up to 1 million collision tests per second with our best GPU implementation in our benchmarks. This is about 2.5X faster than our reference multi-core CPU implementati…