Search results for "Speedup"

showing 10 items of 97 documents

The Gravity Lagrangian According to Solar System Experiments

2005

In this work we show that the gravity lagrangian f(R) at relatively low curvatures in both metric and Palatini formalisms is a bounded function that can only depart from the linearity within the limits defined by well known functions. We obtain those functions by analysing a set of inequalities that any f(R) theory must satisfy in order to be compatible with laboratory and solar system observational constraints. This result implies that the recently suggested f(R) gravity theories with nonlinear terms that dominate at low curvatures are incompatible with observations and, therefore, cannot represent a valid mechanism to justify the cosmic speed-up.

High Energy Physics - TheoryPhysicsGravity (chemistry)SpeedupAstrophysics (astro-ph)FOS: Physical sciencesGeneral Physics and AstronomyGeneral Relativity and Quantum Cosmology (gr-qc)AstrophysicsGeneral Relativity and Quantum CosmologyTheoretical physicsNonlinear systemHigh Energy Physics - Theory (hep-th)Observational cosmologyBounded functionMetric (mathematics)f(R) gravityPerturbation theory (quantum mechanics)Physical Review Letters
researchProduct

LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs

2015

Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribut…

Instruction setCUDASpeedupComputer scienceSparse matrix-vector multiplicationDouble-precision floating-point formatParallel computingGeneral-purpose computing on graphics processing unitsRowSparse matrix2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
researchProduct

Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor

2014

Bit-parallel pattern matching encodes calculated values in bit arrays. This approach gains its efficiency by performing multiple updates within a machine word. An important parameter is therefore the machine word size (e.g. 32 or 64 bits). With the increasing length of vector registers, the efficient mapping of bit-parallel pattern matching algorithms onto modern high performance computing architectures is becoming increasingly important. In this paper, we investigate an efficient implementation of the Wu-Manber approximate pattern matching algorithm on the Intel Xeon Phi coprocessor. This architecture features a 512-bit long vector processing unit (VPU) as well as a large number of process…

Instruction setCoprocessorSpeedupComputer scienceParallel computingPattern matchingIntrinsicsWord (computer architecture)Xeon PhiVector processor2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing
researchProduct

Exploiting selective instruction reuse and value prediction in a superscalar architecture

2009

In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructions. Moreover, 5.61% of branches are unbiased and depend on critical Loads, too. In the same way, about 21% of branches depend on MUL/DIV instructions whereas 3.76% are unbiased and depend on MUL/DIV instructions. These dependences involve high-penalty mispredictions becoming serious performance obstac…

Instructions per cycleSpeedupComputer scienceSpeculative executionSpec#Thread (computing)Parallel computingReuseHardware and ArchitectureSuperscalarHardware_CONTROLSTRUCTURESANDMICROPROGRAMMINGcomputerData cacheSoftwarecomputer.programming_languageJournal of Systems Architecture
researchProduct

XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters

2019

Finding the longest common subsequence (LCS) of two strings is a classical problem in bioinformatics. A basic approach to solve this problem is based on dynamic programming. As the biological sequence databases are growing continuously, bit-parallel sequence comparison algorithms are becoming increasingly important. In this paper, we present XLCS, a new parallel implementation to accelerate the LCS algorithm on Xeon Phi clusters by performing bit-wise operations. We have designed an asynchronous IO framework to improve the data transfer efficiency. To make full use of the computing resources of Xeon Phi clusters, we use three levels of parallelism: node-level, thread-level and vector-level.…

Longest common subsequence problemDynamic programmingSpeedupComputer scienceComputer clusterAsynchronous I/OCacheSupercomputerAlgorithmXeon Phi2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
researchProduct

An Automatic Sleep Scoring Toolbox : Multi-modality of Polysomnography Signals’ Processing

2019

Sleep scoring is a fundamental but time-consuming process in any sleep laboratory. To speed up the process of sleep scoring without compromising accuracy, this paper develops an automatic sleep scoring toolbox with the capability of multi-signal processing. It allows the user to choose signal types and the number of target classes. Then, an automatic process containing signal pre-processing, feature extraction, classifier training (or prediction) and result correction will be performed. Finally, the application interface displays predicted sleep structure, related sleep parameters and the sleep quality index for reference. To improve the identification accuracy of minority stages, a layer-w…

MATLABSpeedupComputer scienceFeature extraction02 engineering and technologyPolysomnographyMachine learningcomputer.software_genreuni (lepotila)polysomnography0202 electrical engineering electronic engineering information engineeringmedicineHidden Markov modelSignal processingSleep Stagesmedicine.diagnostic_testbusiness.industrysignaalianalyysi020206 networking & telecommunicationsautomatic sleep scoringToolboxmulti-modality analysis020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerClassifier (UML)MATLAB toolbox
researchProduct

Deep-Learning-Enabled Fast Optical Identification and Characterization of 2D Materials.

2020

© 2020 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Advanced microscopy and/or spectroscopy tools play indispensable roles in nanoscience and nanotechnology research, as they provide rich information about material processes and properties. However, the interpretation of imaging data heavily relies on the “intuition” of experienced researchers. As a result, many of the deep graphical features obtained through these tools are often unused because of difficulties in processing the data and finding the correlations. Such challenges can be well addressed by deep learning. In this work, the optical characterization of 2D materials is used as a case study, and a neural-network-based algorithm is de…

Materials scienceSpeedupbusiness.industryMechanical EngineeringDeep learningProbability and statistics02 engineering and technology010402 general chemistry021001 nanoscience & nanotechnologyMachine learningcomputer.software_genre01 natural sciencesImaging data0104 chemical sciencesMechanics of MaterialsGeneral Materials ScienceOptical identificationArtificial intelligence0210 nano-technologybusinessTransfer of learningcomputerIntuitionAdvanced materials (Deerfield Beach, Fla.)
researchProduct

Applying the approximation method PAINT and the interactive method NIMBUS to the multiobjective optimization of operating a wastewater treatment plant

2014

Using an interactive multiobjective optimization method called NIMBUS and an approximation method called PAINT, preferable solutions to a five-objective problem of operating a wastewater treatment plant are found. The decision maker giving preference information is an expert in wastewater treatment plant design at the engineering company Pöyry Finland Ltd. The wastewater treatment problem is computationally expensive and requires running a simulator to evaluate the values of the objective functions. This often leads to problems with interactive methods as the decision maker may get frustrated while waiting for new solutions to be computed. Thus, a newly developed PAINT method is used to spe…

Mathematical optimizationEngineeringOR in natural resourcesControl and OptimizationSpeedupbusiness.industryApplied Mathematicsproductivity and competitivenessManagement Science and Operations ResearchsimulationDecision makerMulti-objective optimizationIndustrial and Manufacturing EngineeringComputer Science ApplicationsSet (abstract data type)Pareto optimalmultiple objective programmingSewage treatmentPlant designbusinessta218Integer (computer science)
researchProduct

Genetic algorithms for 3d reconstruction with supershapes

2009

Supershape model is a recent primitive that represents numerous 3D shapes with several symmetry axes. The main interest of this model is its capability to reconstruct more complex shape than superquadric model with only one implicit equation. In this paper we propose a genetic algorithms to re-construct a point cloud using those primitives. We used the pseudo-Euclidean distance to introduce a threshold to handle real data imperfection and speed up the process. Simulations using our proposed fitness functions and a fitness function based on inside-outside function show that our fitness function based on the pseudo-Euclidean distance performs better.

Mathematical optimizationFitness functionSpeedupImplicit functionFitness approximation3D reconstructionPoint cloudFunction (mathematics)Iterative reconstructionAlgorithmMathematics2009 16th IEEE International Conference on Image Processing (ICIP)
researchProduct

Parallel Simulated Annealing: Getting Super Linear Speedups

2005

The study described in this paper tries to improve and combine different approaches that are able to speed up applications of the Simulated Annealing model. It investigates separately two main aspects concerning the degree of parallelism an implementation can egectively exploit at the initial andfinal periods of an execution. As for case studies, it deals with two implementations: the Job shop Scheduling problem and the poryblio selection problem. The paper reports the results of a large number of experiments, carried out by means of a transputer network and a hypercube system. They give useful suggestions about selecting the most suitable values of the intervention parameters to achieve su…

Mathematical optimizationSpeedupComputational complexity theoryJob shop schedulingParallel processing (DSP implementation)Computer scienceSimulated annealingDegree of parallelismFlow shop schedulingParallel computingHypercubeProceedings. Second Euromicro Workshop on Parallel and Distributed Processing
researchProduct