Search results for "Speedup"
showing 10 items of 97 documents
Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations
2012
Summary In the design process of computer systems or processor architectures, typically many different parameters are exposed to configure, tune, and optimize every component of a system. For evaluations and before production, it is desirable to know the best setting for all parameters. Processing speed is no longer the only objective that needs to be optimized; power consumption, area, and so on have become very important. Thus, the best configurations have to be found in respect to multiple objectives. In this article, we use a multi-objective design space exploration tool called Framework for Automatic Design Space Exploration (FADSE) to automatically find near-optimal configurations in …
CliffoSor: A Parallel Embedded Architecture for Geometric Algebra and Computer Graphics
2006
Geometric object representation and their transformations are the two key aspects in computer graphics applications. Traditionally, compute-intensive matrix calculations are involved to model and render 3D scenery. Geometric algebra (a.k.a. Clifford algebra) is gaining growing attention for its natural way to model geometric facts coupled with its being a powerful analytical tool for symbolic calculations. In this paper, the architecture of CliffoSor (Clifford Processor) is introduced. ClifforSor is an embedded parallel coprocessing core that offers direct hardware support to Clifford algebra operators. A prototype implementation on an FPGA board is detailed. Initial test results show more …
Circuits and excitations to enable Brownian token-based computing with skyrmions
2021
Brownian computing exploits thermal motion of discrete signal carriers (tokens) for computations. In this paper we address two major challenges that hinder competitive realizations of circuits and application of Brownian token-based computing in actual devices for instance based on magnetic skyrmions. To overcome the problem that crossings generate for the fabrication of circuits, we design a crossing-free layout for a composite half-adder module. This layout greatly simplifies experimental implementations as wire crossings are effectively avoided. Additionally, our design is shorter to speed up computations compared to conventional designs. To address the key issue of slow computation base…
First Experiences on an Accurate SPH Method on GPUs
2017
It is well known that the standard formulation of the Smoothed Particle Hydrodynamics is usually poor when scattered data distribution is considered or when the approximation near the boundary occurs. Moreover, the method is computational demanding when a high number of data sites and evaluation points are employed. In this paper an enhanced version of the method is proposed improving the accuracy and the efficiency by using a HPC environment. Our implementation exploits the processing power of GPUs for the basic computational kernel resolution. The performance gain demonstrates the method to be accurate and suitable to deal with large sets of data.
Improved SOM Learning using Simulated Annealing
2007
Self-Organizing Map (SOM) algorithm has been extensively used for analysis and classification problems. For this kind of problems, datasets become more and more large and it is necessary to speed up the SOM learning. In this paper we present an application of the Simulated Annealing (SA) procedure to the SOM learning algorithm. The goal of the algorithm is to obtain fast learning and better performance in terms of matching of input data and regularity of the obtained map. An advantage of the proposed technique is that it preserves the simplicity of the basic algorithm. Several tests, carried out on different large datasets, demonstrate the effectiveness of the proposed algorithm in comparis…
Versatile optimization-based speed-up method for autofocusing in digital holographic microscopy
2021
We propose a speed-up method for the in-focus plane detection in digital holographic microscopy that can be applied to a broad class of autofocusing algorithms that involve repetitive propagation of an object wave to various axial locations to decide the in-focus position. The classical autofocusing algorithms apply a uniform search strategy, i.e., they probe multiple, uniformly distributed axial locations, which leads to heavy computational overhead. Our method substantially reduces the computational load, without sacrificing the accuracy, by skillfully selecting the next location to investigate, which results in a decreased total number of probed propagation distances. This is achieved by…
Automatic multi-objective optimization of parameters for hardware and code optimizations
2011
Recent computer architectures can be configured in lots of different ways. To explore this huge design space, system simulators are typically used. As performance is no longer the only decisive factor but also e.g. power usage or the resource usage of the system it became very hard for designers to select optimal configurations. In this article we use a multi-objective design space exploration tool called FADSE to explore the vast design space of the Grid Alu Processor (GAP) and its post-link optimizer called GAPtimize. We improved FADSE with techniques to make it more robust against failures and to speed up evaluations through parallel processing. For the GAP, we present an approximation o…
cuBool: Bit-Parallel Boolean Matrix Factorization on CUDA-Enabled Accelerators
2018
Boolean Matrix Factorization (BMF) is a commonly used technique in the field of unsupervised data analytics. The goal is to decompose a ground truth matrix C into a product of two matrices A and $B$ being either an exact or approximate rank k factorization of C. Both exact and approximate factorization are time-consuming tasks due to their combinatorial complexity. In this paper, we introduce a massively parallel implementation of BMF - namely cuBool - in order to significantly speed up factorization of huge Boolean matrices. Our approach is based on alternately adjusting rows and columns of A and B using thousands of lightweight CUDA threads. The massively parallel manipulation of entries …
Reconfigurable Accelerator for the Word-Matching Stage of BLASTN
2013
BLAST is one of the most popular sequence analysis tools used by molecular biologists. It is designed to efficiently find similar regions between two sequences that have biological significance. However, because the size of genomic databases is growing rapidly, the computation time of BLAST, when performing a complete genomic database search, is continuously increasing. Thus, there is a clear need to accelerate this process. In this paper, we present a new approach for genomic sequence database scanning utilizing reconfigurable field programmable gate array (FPGA)-based hardware. In order to derive an efficient structure for BLASTN, we propose a reconfigurable architecture to accelerate the…
Quantum Machine Learning: A tutorial
2021
This tutorial provides an overview of Quantum Machine Learning (QML), a relatively novel discipline that brings together concepts from Machine Learning (ML), Quantum Computing (QC) and Quantum Information (QI). The great development experienced by QC, partly due to the involvement of giant technological companies as well as the popularity and success of ML have been responsible of making QML one of the main streams for researchers working on fuzzy borders between Physics, Mathematics and Computer Science. A possible, although arguably coarse, classification of QML methods may be based on those approaches that make use of ML in a quantum experimentation environment and those others that take…