Search results for "Speedup"
showing 10 items of 97 documents
The Gravity Lagrangian According to Solar System Experiments
2005
In this work we show that the gravity lagrangian f(R) at relatively low curvatures in both metric and Palatini formalisms is a bounded function that can only depart from the linearity within the limits defined by well known functions. We obtain those functions by analysing a set of inequalities that any f(R) theory must satisfy in order to be compatible with laboratory and solar system observational constraints. This result implies that the recently suggested f(R) gravity theories with nonlinear terms that dominate at low curvatures are incompatible with observations and, therefore, cannot represent a valid mechanism to justify the cosmic speed-up.
LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs
2015
Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…
Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor
2014
Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…
Exploiting selective instruction reuse and value prediction in a superscalar architecture
2009
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructions. Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstac…
XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters
2019
Finding the longest common subsequence (LCS) of two strings is a classical problem in bioinformatics. A basic approach to solve this problem is based on dynamic programming. As the biological sequence databases are growing continuously, bit-parallel sequence comparison algorithms are becoming increasingly important. In this paper, we present XLCS, a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations. We have designed an asynchronous IO framework to improve the data transfer efficiency. To make full use of the computing resources of Xeon Phi clusters, we use three levels of parallelism: node-level, thread-level and vector-level.…
An Automatic Sleep Scoring Toolbox : Multi-modality of Polysomnography Signals’ Processing
2019
Sleep scoring is a fundamental but time-consuming process in any sleep laboratory. To speed up the process of sleep scoring without compromising accuracy, this paper develops an automatic sleep scoring toolbox with the capability of multi-signal processing. It allows the user to choose signal types and the number of target classes. Then, an automatic process containing signal pre-processing, feature extraction, classifier training (or prediction) and result correction will be performed. Finally, the application interface displays predicted sleep structure, related sleep parameters and the sleep quality index for reference. To improve the identification accuracy of minority stages, a layer-w…
Deep-Learning-Enabled Fast Optical Identification and Characterization of 2D Materials.
2020
© 2020 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Advanced microscopy and/or spectroscopy tools play indispensable roles in nanoscience and nanotechnology research, as they provide rich information about material processes and properties. However, the interpretation of imaging data heavily relies on the “intuition” of experienced researchers. As a result, many of the deep graphical features obtained through these tools are often unused because of difficulties in processing the data and finding the correlations. Such challenges can be well addressed by deep learning. In this work, the optical characterization of 2D materials is used as a case study, and a neural-network-based algorithm is de…
Applying the approximation method PAINT and the interactive method NIMBUS to the multiobjective optimization of operating a wastewater treatment plant
2014
Using an interactive multiobjective optimization method called NIMBUS and an approximation method called PAINT, preferable solutions to a five-objective problem of operating a wastewater treatment plant are found. The decision maker giving preference information is an expert in wastewater treatment plant design at the engineering company Pöyry Finland Ltd. The wastewater treatment problem is computationally expensive and requires running a simulator to evaluate the values of the objective functions. This often leads to problems with interactive methods as the decision maker may get frustrated while waiting for new solutions to be computed. Thus, a newly developed PAINT method is used to spe…
Genetic algorithms for 3d reconstruction with supershapes
2009
Supershape model is a recent primitive that represents numerous 3D shapes with several symmetry axes. The main interest of this model is its capability to reconstruct more complex shape than superquadric model with only one implicit equation. In this paper we propose a genetic algorithms to re-construct a point cloud using those primitives. We used the pseudo-Euclidean distance to introduce a threshold to handle real data imperfection and speed up the process. Simulations using our proposed fitness functions and a fitness function based on inside-outside function show that our fitness function based on the pseudo-Euclidean distance performs better.
Parallel Simulated Annealing: Getting Super Linear Speedups
2005
The study described in this paper tries to improve and combine different approaches that are able to speed up applications of the Simulated Annealing model. It investigates separately two main aspects concerning the degree of parallelism an implementation can egectively exploit at the initial andfinal periods of an execution. As for case studies, it deals with two implementations: the Job shop Scheduling problem and the poryblio selection problem. The paper reports the results of a large number of experiments, carried out by means of a transputer network and a hypercube system. They give useful suggestions about selecting the most suitable values of the intervention parameters to achieve su…