Search results for "Parallel computing"

showing 10 items of 189 documents

On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method

2018

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point perfor…

Tridiagonal linear systemsProgramvaruteknikComputer Networks and CommunicationsComputer sciencePartial solution techniquereduction010103 numerical & computational mathematicsParallel computingtietotekniikka01 natural scienceslineaariset mallitTheoretical Computer ScienceSeparable spaceinformation technologyArtificial IntelligenceSeparable block tridiagonal linear systemBlock (telecommunications)Fast direct solverRadix0101 mathematicsta113Computer Sciencesta111Linear systemSoftware EngineeringGPU computingSolverComputer Science::Numerical Analysis010101 applied mathematicsPSCR methodDatavetenskap (datalogi)partial solution techniqueHardware and ArchitectureComputer Science::Mathematical Softwarepienennyslinear modelsSoftwareRoofline modelCyclic reductionJournal of Parallel and Distributed Computing

researchProduct

Fast Poisson solvers for graphics processing units

2013

Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double precision floating-point arithmetic. The numerical results indicate up to 6-fold speed increase in the case of the two-dimensional problems and up to 3- fold speed increase in the case of the three-dimensional problems when compared to equivalent CPU implementations run on a Intel Core i7 quad-core CPU…

Tridiagonal matrixOpenCLComputer scienceparallel computingScalar (mathematics)Linear systemSyklinen reductionGPGPUGPUDouble-precision floating-point formatParallel computingSolverPoisson distributionPSCRComputational sciencefast Poisson solversymbols.namesakenopea Poisson-ratkaisijanäytönohjainsymbolsComputer Science::Mathematical SoftwareCyclic reductionGraphicsrinnakkaislaskentaCyclic reduction

researchProduct

Parallelization of adaptive MC integrators

1997

Monte Carlo (MC) methods for numerical integration seem to be embarassingly parallel on first sight. When adaptive schemes are applied in order to enhance convergence however, the seemingly most natural way of replicating the whole job on each processor can potentially ruin the adaptive behaviour. Using the popular VEGAS-Algorithm as an example an economic method of semi-micro parallelization with variable grain-size is presented and contrasted with another straightforward approach of macro-parallelization. A portable implementation of this semi-micro parallelization is used in the xloops-project and is made publicly available.

Variable (computer science)Hardware and ArchitectureComputer scienceAdaptive behaviourIntegratorMonte Carlo methodConvergence (routing)FOS: Physical sciencesGeneral Physics and AstronomyParallel computingComputational Physics (physics.comp-ph)Physics - Computational PhysicsNumerical integrationComputer Physics Communications

researchProduct

CIPRNG: A VLSI Family of Chaotic Iterations Post-Processings for $\mathbb {F}_{2}$ -Linear Pseudorandom Number Generation Based on Zynq MPSoC

2018

Hardware pseudorandom number generators are continuously improved to satisfy both physical and ubiquitous computing security system challenges. The main contribution of this paper is to propose two post-processing modules in hardware, to improve the randomness of linear PRNGs while succeeding in passing the TestU01 statistical battery of tests. They are based on chaotic iterations and are denoted by CIPRNG-MC and CIPRNG-XOR. They have various interesting properties, encompassing the ability to improve the statistical profile of the generators on which they iterate. Such post-processing have been implemented on FPGA and ASIC without inferring any blocs (RAM or DSP). A comparison in terms of …

Very-large-scale integrationPseudorandom number generator020208 electrical & electronic engineeringChaotic02 engineering and technologyParallel computingMPSoCTestU01020202 computer hardware & architectureApplication-specific integrated circuit0202 electrical engineering electronic engineering information engineeringElectrical and Electronic EngineeringField-programmable gate arrayThroughput (business)MathematicsIEEE Transactions on Circuits and Systems I: Regular Papers

researchProduct

Hierarchical Parallelization of an H.264/AVC Video Encoder

2006

Last generation video encoding standards increase computing demands in order to reach the limits on compression efficiency. This is particularly the case of H.264/AVC specification that is gaining interest in industry. We are interested in applying parallel processing to H.264 encoders in order to fulfill the computation requirements imposed by stressing applications like video on demand, videoconference, live broadcast, etc. Given a delivered video quality and bit rate, the main complexity parameters are image resolution, frame rate and latency. These parameters can still be pushed forward in such a way that special purpose hardware solutions are not available. Parallel processing based on…

VideoconferencingComputer scienceComputationMessage passingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONParallel computingLatency (engineering)computer.software_genreVideo qualityFrame rateEncoderImage resolutioncomputerInternational Symposium on Parallel Computing in Electrical Engineering (PARELEC'06)

researchProduct

Lightweight LCP construction for next-generation sequencing datasets

2012

The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and B…

Whole genome sequencingGenomics (q-bio.GN)FOS: Computer and information sciencesSequenceBWT; LCP; next-generation sequencing datasetsBWT LCP text indexes next-generation sequencing datasets massive datasetsSettore INF/01 - InformaticaComputer scienceComputationString (computer science)LCP arrayParallel computingData structureDNA sequencingSubstringBWTLCPFOS: Biological sciencesComputer Science - Data Structures and AlgorithmsQuantitative Biology - GenomicsData Structures and Algorithms (cs.DS)next-generation sequencing datasets

researchProduct

Design of a Real-time face detection parallel architecture using High-Level Synthesis

2008

Abstract We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE), by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We …

[INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR][INFO.INFO-AR] Computer Science [cs]/Hardware Architecture [cs.AR]General Computer ScienceVideo Graphics ArrayComputer scienceDataflowlcsh:Electronicslcsh:TK7800-8360020207 software engineering02 engineering and technologyParallel computing020202 computer hardware & architectureConvolutionScheduling (computing)Control and Systems EngineeringHigh-level synthesis0202 electrical engineering electronic engineering information engineeringParallel architecture[ INFO.INFO-AR ] Computer Science [cs]/Hardware Architecture [cs.AR]ArchitectureFace detectionComputingMilieux_MISCELLANEOUSComputer Science(all)

researchProduct

On GPU-accelerated fast direct solvers and their applications in image denoising

2015

block cyclic reductionnäytönohjaimetOpenCLnumeeriset menetelmätprosessoritimage denoisingparallel computingmean curvatureGPU computingkuvankäsittelyimage processingfast Poisson solverseparable block tridiagonal linear systemPSCR methodoptimointialgoritmitohjelmointiaugmented Lagrangian methodkohinafast direct solverrinnakkaislaskentaalternating direction methods of multipliers

researchProduct

Multi-Authored Manuscripts and Speedup in Academic Publishing

2014

It is unfair to count a n-authored paper as one paper for each coauthor, i.e., as n papers: this is “feeding the multitude”. Sharing the credit among coauthors by percentages or by simply dividing by n is fairer but somewhat harsh. So, we propose to take into account the productivity gains of parallelization by introducing a team bonus function that multiplies the allocation thereby increasing the credit allocated to each coauthor.The degree of parallelization cannot be determined exogenously discipline by discipline. So, one may propose that each team of coauthors indicates how the labor was organized to produce the paper. Unfortunately, the coauthors may systematically bias their answers …

business.industryComputer scienceCheatingmedia_common.quotation_subjectComputingMilieux_PERSONALCOMPUTINGParallel computingLimitingN-rulePublishingOrder (exchange)Bounded functionbusinessFunction (engineering)media_commonSSRN Electronic Journal

researchProduct

PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment

2017

The progress of next-generation sequencing has a major impact on medical and genomic research. This technology can now produce billions of short DNA fragments (reads) in a single run. One of the most demanding computational problems used by almost every sequencing pipeline is short-read alignment; i.e. determining where each fragment originated from in the original genome. Most current solutions are based on a seed-and-extend approach, where promising candidate regions (seeds) are first identified and subsequently extended in order to verify whether a full high-scoring alignment actually exists in the vicinity of each seed. Seed verification is the main bottleneck in many state-of-the-art a…

chemistry.chemical_compoundSpeedupchemistryComputer scienceGenomicsParallel computingComputational problemGenomeAlgorithmDNA sequencingDNA2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

researchProduct