Search results for " computing"

showing 10 items of 2075 documents

Adaptive Low Priority Packet Marking for Better TCP Performance

2003

This paper proposes a packet marking scheme for TCP traffic. Unlike previous literature work, in our scheme the majority of TCP packets are transmitted as high priority. The role of a low priority packet appears that of a probe, whose goal is to early discover network congestion conditions. Low priority packets are marked according to an adaptive marking algorithm. Numerical results show that our scheme provides improved throughput/delay performance.

CUBIC TCPTCP VegasTCP accelerationbusiness.industryComputer scienceComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKSReal-time computingTCP tuningTCP global synchronizationTCP Westwood plusTCP Friendly Rate ControlZeta-TCPbusinessComputer network

researchProduct

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

2012

H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been many solutions that deal with video codecs using heterogeneous systems. This paper proposes an algorithm to perform H.264/AVC inter prediction. The proposed algorithm performs the motion estimation, both with full-pixel and sub-pixel accuracy, using CUDA to assist the CPU, obtaining remarkable time …

CUDAComputational complexity theoryComputer scienceMotion estimationComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONCodecSymmetric multiprocessor systemImage processingData_CODINGANDINFORMATIONTHEORYCentral processing unitParallel computingData compression

researchProduct

Gossip

2019

Nowadays, a growing number of servers and workstations feature an increasing number of GPUs. However, slow communication among GPUs can lead to poor application performance. Thus, there is a latent demand for efficient multi-GPU communication primitives on such systems. This paper focuses on the gather, scatter and all-to-all collectives, which are important operations for various algorithms including parallel sorting and distributed hashing. We present two distinct communication strategies (ring-based and flow-oriented) to generate transfer plans for their topology-aware implementation on NVLink-connected multi-GPU systems. We achieve a throughput of up to 526 GB/s for all-to-all and 148 G…

CUDAComputer scienceGossipDistributed computingTransfer (computing)ServerHash functionOverhead (computing)Throughput (business)Proceedings of the 48th International Conference on Parallel Processing

researchProduct

CRiSPy-CUDA: Computing Species Richness in 16S rRNA Pyrosequencing Datasets with CUDA

2011

Pyrosequencing technologies are frequently used for sequencing the 16S rRNA marker gene for metagenomic studies of microbial communities. Computing a pairwise genetic distance matrix from the produced reads is an important but highly time consuming task. In this paper, we present a parallelized tool (called CRiSPy) for scalable pairwise genetic distance matrix computation and clustering that is based on the processing pipeline of the popular ESPRIT software package. To achieve high computational efficiency, we have designed massively parallel CUDA algorithms for pairwise k-mer distance and pairwise genetic distance computation. We have also implemented a memory-efficient sparse matrix clust…

CUDADistance matrixComputer scienceMetagenomicsPipeline (computing)Pairwise comparisonParallel computingCluster analysisQuantitative Biology::GenomicsMassively parallelSparse matrix

researchProduct

COMPARISON OF CPML IMPLEMENTATIONS FOR THE GPU-ACCELERATED FDTD SOLVER

2011

Three distinctively difierent implementations of convolu- tional perfectly matched layer for the FDTD method on CUDA enabled graphics processing units are presented. All implementations store ad- ditional variables only inside the convolutional perfectly matched lay- ers, and the computational speeds scale according to the thickness of these layers. The merits of the difierent approaches are discussed, and a comparison of computational performance is made using complex real-life benchmarks.

CUDAPerfectly matched layerScale (ratio)Computer scienceFinite-difference time-domain methodParallel computingGraphicsSolverCondensed Matter PhysicsImplementationElectronic Optical and Magnetic MaterialsComputational scienceProgress In Electromagnetics Research M

researchProduct

CUSHAW Suite: Parallel and Efficient Algorithms for NGS Read Alignment

2017

Next generation sequencing (NGS) technologies have enabled cheap, large-scale, and high-throughput production of short DNA sequence reads and thereby have promoted the explosive growth of data volume. Unfortunately, the produced reads are short and prone to contain errors that are incurred during sequencing cycles. Both large data volume and sequencing errors have complicated the mapping of NGS reads onto the reference genome and have motivated the development of various aligners for very short reads, typically less than 100 base pairs (bps) in length. As read length continues to increase, propelled by advances in NGS technologies, these longer reads tend to have higher sequencing error rat…

CUDASoftware suiteComputer scienceSuiteVolume (computing)Human genomeParallel computingBioinformaticsGenomeDNA sequencingReference genome

researchProduct

Parallelized Clustering of Protein Structures on CUDA-Enabled GPUs

2014

Estimation of the pose in which two given molecules might bind together to form a potential complex is a crucial task in structural biology. To solve this so-called "docking problem", most algorithms initially generate large numbers of candidate poses (or decoys) which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates ranges from thousands to millions, performing the clustering on standard CPUs is highly time consuming. In this paper we analyze and evaluate different approaches to parallelize the nearest neighbor chain algorithm to perform hierarchical Ward clustering of protein structures usin…

CUDASpeedupComputer scienceNearest-neighbor chain algorithmParallel computingCluster analysisRoot-mean-square deviationPoseWard's methodHierarchical clustering2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

researchProduct

Approximate Algorithm for Fast Capacity Provisioning in WANs with Trade-Off between Performance and Cost under Budget Constraint

2014

Due to the emergence of Software Defined Networking (SDN) with the idea of centralized control over computer networks, the Capacity and Flow Assignment Problem (CFA) may be approached in a classical non-distributed fashion in real-life scenarios. The question arises whether a heuristical approach to this NP-complete problem is of any use in practice.

Capacity provisioningFlow (mathematics)Computer scienceDistributed computingControl (management)Routing (electronic design automation)Trade-offSoftware-defined networkingAssignment problemBudget constraint

researchProduct

Kif3a interacts with Dynactin subunit p150 Glued to organize centriole subdistal appendages.

2013

Formation of cilia, microtubule-based structures that function in propulsion and sensation, requires Kif3a, a subunit of Kinesin II essential for intraflagellar transport (IFT). We have found that, Kif3a is also required to organize centrioles. In the absence of Kif3a, the subdistal appendages of centrioles are disorganized and lack p150(Glued) and Ninein. Consequently, microtubule anchoring, centriole cohesion and basal foot formation are abrogated by loss of Kif3a. Kif3a localizes to the mother centriole and interacts with the Dynactin subunit p150(Glued) . Depletion of p150(Glued) phenocopies the effects of loss of Kif3a, indicating that Kif3a recruitment of p150(Glued) is critical for s…

CentrioleKnockoutKinesinsBiologycentriole cohesionKif3aMedical and Health SciencesArticleGeneral Biochemistry Genetics and Molecular BiologyMiceMicrotubuleIntraflagellar transportInformation and Computing SciencesAnimalsHumansKIF3AMicrotubule anchoringMolecular BiologyCentriolesMice KnockoutGeneral Immunology and MicrobiologyGeneral NeuroscienceCiliumTumor Suppressor ProteinsNuclear ProteinsKinesinDynactin ComplexBiological SciencesCell biologyCytoskeletal ProteinscentrosomeCentrosomeHela CellsDynactinGeneric health relevanceMicrotubule-Associated Proteinsp150(Glued)HeLa Cellssubdistal appendageDevelopmental Biology

researchProduct

Generalized centro-invertible matrices with applications

2014

Centro-invertible matrices are introduced by R.S. Wikramaratna in 2008. For an involutory matrix R, we define the generalized centro-invertible matrices with respect to R to be those matrices A such that RAR = A^−1. We apply these matrices to a problem in modular arithmetic. Specifically, algorithms for image blurring/deblurring are designed by means of generalized centro-invertible matrices. In addition, if R1 and R2 are n × n involutory matrices, then there is a simple bijection between the set of all centro-invertible matrices with respect to R1 and the set with respect to R2.

Centro-symmetric matrixSquare root of a 2 by 2 matrixApplied MathematicsInvolutory matrixINGENIERIA TELEMATICAMatrius (Matemàtica)Matrix ringMatrix multiplicationCombinatoricsMatrix (mathematics)Integer matrix2 × 2 real matricesCentro-invertible matrixMatrix analysisInvolutory matrixMATEMATICA APLICADAComputer Science::Distributed Parallel and Cluster ComputingMathematics

researchProduct