Search results for "scala"
showing 10 items of 1416 documents
Massively Parallel ANS Decoding on GPUs
2019
In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results wit…
Fault-Tolerant Network-on-Chip Design for Mesh-of-Tree Topology Using Particle Swarm Optimization
2018
As the size of the chip is scaling down the density of Intellectual Property (IP) cores integrated on a chip has been increased rapidly. The communication between these IP cores on a chip is highly challenging. To overcome this issue, Network-on-Chip (NoC) has been proposed to provide an efficient and a scalable communication architecture. In the deep sub-micron level NoCs are prone to faults which can occur in any component of NoC. To build a reliable and robust systems, it is necessary to apply efficient fault-tolerant techniques. In this paper, we present a flexible spare core placement in Mesh-of-Tree (MoT) topology using Particle Swarm Optimization (PSO) by considering IP core failures…
WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes
2018
Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of …
SWMapper: Scalable Read Mapper on SunWay TaihuLight
2020
With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data tra…
A Distributed Multi-Authority Attribute Based Encryption Scheme for Secure Sharing of Personal Health Records
2017
Personal health records (PHR) are an emerging health information exchange model, which facilitates PHR owners to efficiently manage their health data. Typically, PHRs are outsourced and stored in third-party cloud platforms. Although, outsourcing private health data to third-party platforms is an appealing solution for PHR owners, it may lead to significant privacy concerns, because there is a higher risk of leaking private data to unauthorized parties. As a way of ensuring PHR owners' control of their outsourced PHR data, attribute based encryption (ABE) mechanisms have been considered due to the fact that such schemes facilitate a mechanism of sharing encrypted data among a set of intende…
Scalable implementation of measuring distances in a Riemannian manifold based on the Fisher Information metric
2019
This paper focuses on the scalability of the Fisher Information manifold by applying techniques of distributed computing. The main objective is to investigate methodologies to improve two bottlenecks associated with the measurement of distances in a Riemannian manifold formed by the Fisher Information metric. The first bottleneck is the quadratic increase in the number of pairwise distances. The second is the computation of global distances, approximated through a fully connected network of the observed pairwise distances, where the challenge is the computation of the all sources shortest path (ASSP). The scalable implementation for the pairwise distances is performed in Spark. The scalable…
Scalability of GPU-Processed 3D Distance Maps for Industrial Environments
2018
This paper contains a benchmark analysis of the open source library GPU-Voxels together with the Robot Operating System (ROS) in large-scale industrial robotics environment. Six sensor nodes with embedded computing generate real-time point cloud data as ROS topics. The overall data from all sensor nodes is processed by a combination of CPU and GPU on a central ROS node. Experimental results demonstrate that the system is able to handle frame rates of 10 and 20 Hz with voxel sizes of 4, 6, 8 and 12 cm without saturation of the CPU or the GPU used by the GPU-Voxels library. The results in this paper show that ROS, in combination with GPU-Voxels, can be used as a viable solution for real-time …
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?
2020
Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of an ordinary least-squares problem. Several different solvers can be applied to the resulting linear problem. In this paper, a thorough comparison of possible and recently proposed, especially randomized, algorithms is carried out for this problem with a representative set of regression datasets. In addition, we compare MLM with shallow and deep feedforward neural network models and study the effects of the number of observations and the number of features with a special dat…
Secure and efficient verification for data aggregation in wireless sensor networks
2017
Summary The Internet of Things (IoT) concept is, and will be, one of the most interesting topics in the field of Information and Communications Technology. Covering a wide range of applications, wireless sensor networks (WSNs) can play an important role in IoT by seamless integration among thousands of sensors. The benefits of using WSN in IoT include the integrity, scalability, robustness, and easiness in deployment. In WSNs, data aggregation is a famous technique, which, on one hand, plays an essential role in energy preservation and, on the other hand, makes the network prone to different kinds of attacks. The detection of false data injection and impersonation attacks is one of the majo…
Deduplication Potential of HPC Applications’ Checkpoints
2016
HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design.