Search results for " computing"
showing 10 items of 2075 documents
CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations
2013
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix into SCOO format.An efficient Dual-GPU implementation which overlaps computation and communication is described.Extensive performance comparisons of SCOO compared to other formats on GPUs and CPUs are provided. Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA i…
Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations
2012
Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in …
CliffoSor: A Parallel Embedded Architecture for Geometric Algebra and Computer Graphics
2006
Geometric object representation and their transformations are the two key aspects in computer graphics applications. Traditionally, compute-intensive matrix calculations are involved to model and render 3D scenery. Geometric algebra (a.k.a. Clifford algebra) is gaining growing attention for its natural way to model geometric facts coupled with its being a powerful analytical tool for symbolic calculations. In this paper, the architecture of CliffoSor (Clifford Processor) is introduced. ClifforSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on an FPGA board is detailed. Initial test results show more …
cuBool: Bit-Parallel Boolean Matrix Factorization on CUDA-Enabled Accelerators
2018
Boolean Matrix Factorization (BMF) is a commonly used technique in the field of unsupervised data analytics. The goal is to decompose a ground truth matrix C into a product of two matrices A and $B$ being either an exact or approximate rank k factorization of C. Both exact and approximate factorization are time-consuming tasks due to their combinatorial complexity. In this paper, we introduce a massively parallel implementation of BMF - namely cuBool - in order to significantly speed up factorization of huge Boolean matrices. Our approach is based on alternately adjusting rows and columns of A and B using thousands of lightweight CUDA threads. The massively parallel manipulation of entries …
Reconfigurable Accelerator for the Word-Matching Stage of BLASTN
2013
BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the…
Quantum Machine Learning: A tutorial
2021
This tutorial provides an overview of Quantum Machine Learning (QML), a relatively novel discipline that brings together concepts from Machine Learning (ML), Quantum Computing (QC) and Quantum Information (QI). The great development experienced by QC, partly due to the involvement of giant technological companies as well as the popularity and success of ML have been responsible of making QML one of the main streams for researchers working on fuzzy borders between Physics, Mathematics and Computer Science. A possible, although arguably coarse, classification of QML methods may be based on those approaches that make use of ML in a quantum experimentation environment and those others that take…
Alignment-Free Sequence Comparison over Hadoop for Computational Biology
2015
Sequence comparison i.e., The assessment of how similar two biological sequences are to each other, is a fundamental and routine task in Computational Biology and Bioinformatics. Classically, alignment methods are the de facto standard for such an assessment. In fact, considerable research efforts for the development of efficient algorithms, both on classic and parallel architectures, has been carried out in the past 50 years. Due to the growing amount of sequence data being produced, a new class of methods has emerged: Alignment-free methods. Research in this ares has become very intense in the past few years, stimulated by the advent of Next Generation Sequencing technologies, since those…
Fast spiking neural network architecture for low-cost FPGA devices
2012
Spiking Neural Networks (SNN) consist of fully interconnected computation units (neurons) based on spike processing. This type of networks resembles those found in biological systems studied by neuroscientists. This paper shows a hardware implementation for SNN. First, SNN require the inputs to be spikes, being necessary a conversion system (encoding) from digital values into spikes. For travelling spikes, each neuron interconnection is characterized by weights and delays, requiring an internal neuron processing by a Postsynaptic Potential (PSP) function and membrane potential threshold evaluation for a postsynaptic output spike generation. In order to model a real biological system by arti…
Invariant aspects in M-commerce environments
2005
Mobile phones and other small and powerful portable devices have revolutionized personal communication and affected the lifestyles of the people in the industrialized world. Following credible estimates, in a few years there will over two - billions of such portable devices in use. An emerging trend is the electronic commerce performed using mobile terminals over wireless networks, often called mobile commerce or M-commerce. Mobile commerce environments are characterized by high complexity, including myriads of technical and organizational aspects. This property makes it difficult to distinguish the more fundamental issues, structures, and concepts in mobile commerce from the hype. To captu…
A Network Formation Game Approach to Study BitTorrent Tit-for-Tat
2007
The Tit-for-Tat strategy implemented in BitTorrent (BT) clients is generally considered robust to selfish behaviours. The authors of [1] support this belief studying how Tit-for-Tat can affect selfish peers who are able to set their upload bandwidth. They show that there is a "good" Nash Equilibrium at which each peer uploads at the maximum rate. In this paper we consider a different game where BT clients can change the number of connections to open in order to improve their performance. We study this game using the analytical framework of network formation games [2]. In particular we characterize the set of pairwise stable networks the peers can form and how the peers can dynamically reach…