Author: José L. Sánchez

0000000000012260

AUTHOR

José L. Sánchez

0000-0002-3498-9174

showing 14 related works from this author

Evaluation of an Alternative for Increasing Switch Radix

2011

In large switch-based interconnection networks, increasing the switch radix results in a decrease in the total number of network components. In this paper we evaluate an interesting strategy for building high-radix switches going beyond the integration scale bounds. This approach is independent of the evolution of single-chip switches and will remain valid as integration scale keeps evolving. Simulation results show that with a correct internal switch design, this kind of switches achieves almost the same performance as single-chip switches with the same radix, which would be unfeasible with current integration scale.

InterconnectionScale (ratio)Computer sciencebusiness.industryEmbedded systemElectronic engineeringRadixTopology (electrical circuits)Routing (electronic design automation)businessNetwork topologyThroughput (business)2011 IEEE 10th International Symposium on Network Computing and Applications

researchProduct

NoC Reconfiguration for CMP Virtualization

2011

At NoC level, the traffic interferences can be drastically reduced by using virtualization mechanisms. An effective strategy to virtualize a NoC consists in dividing the network in different partitions, each one serving different applications and traffic flows. In this paper, we propose a NoC reconfiguration mechanism to support NoC virtualization under real scenarios. Dynamic reassignment of network resources to different partitions is allowed in order to NoC dynamically adapts to application needs. Evaluation results show a good behavior of CMP virtualization.

Computer sciencebusiness.industryControl reconfigurationDynamic priority schedulingComputerSystemsOrganization_PROCESSORARCHITECTURESVirtualizationcomputer.software_genreNetwork on a chipSystem on a chipResource managementRouting (electronic design automation)businesscomputerComputer network2011 IEEE 10th International Symposium on Network Computing and Applications

researchProduct

A Fast GPU-Based Motion Estimation Algorithm for H.264/AVC

2012

H.264/AVC is the most recent predictive video compression standard to outperform other existing video coding standards by means of higher computational complexity. In recent years, heterogeneous computing has emerged as a cost-efficient solution for high-performance computing. In the literature, several algorithms have been proposed to accelerate video compression, but so far there have not been many solutions that deal with video codecs using heterogeneous systems. This paper proposes an algorithm to perform H.264/AVC inter prediction. The proposed algorithm performs the motion estimation, both with full-pixel and sub-pixel accuracy, using CUDA to assist the CPU, obtaining remarkable time …

CUDAComputational complexity theoryComputer scienceMotion estimationComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONCodecSymmetric multiprocessor systemImage processingData_CODINGANDINFORMATIONTHEORYCentral processing unitParallel computingData compression

researchProduct

Optimal Configuration for N-Dimensional Twin Torus Networks

2014

Torus topology is one of the most common topologies used in the current largest supercomputers. Although 3D torus is widely used, recently some supercomputers in the Top500 list have been built using networks with topologies of five or six dimensions. To obtain an nD torus, 2n ports per node are needed. These ports can be offered by a single or several cards per node. In the second case, there are multiple ways of assigning the dimension and direction of the card ports. In a previous work we proposed the 3D Twin (3DT) torus which uses two 4-port cards per node, and obtained the optimal port configuration. This paper extends and generalizes that work in order to obtain the optimal port confi…

TOP500ComputerSystemsOrganization_COMPUTERSYSTEMIMPLEMENTATIONComputer scienceDimension (graph theory)Node (circuits)Topology (electrical circuits)Algorithm designTorusParallel computingRouting (electronic design automation)Network topologyTopologyComputer Science::Operating Systems2014 IEEE 13th International Symposium on Network Computing and Applications

researchProduct

3D high definition video coding on a GPU-based heterogeneous system

2013

H.264/MVC is a standard for supporting the sensation of 3D, based on coding from 2 (stereo) to N views. H.264/MVC adopts many coding options inherited from single view H.264/AVC, and thus its complexity is even higher, mainly because the number of processing views is higher. In this manuscript, we aim at an efficient parallelization of the most computationally intensive video encoding module for stereo sequences. In particular, inter prediction and its collaborative execution on a heterogeneous platform. The proposal is based on an efficient dynamic load balancing algorithm and on breaking encoding dependencies. Experimental results demonstrate the proposed algorithm's ability to reduce the…

Technology and EngineeringTheoretical computer scienceGeneral Computer ScienceComputer scienceComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION02 engineering and technologyVideo encodingParallelizationsCollaborative execution0202 electrical engineering electronic engineering information engineeringReference algorithmElectrical and Electronic EngineeringHigh definition video codingMODE DECISION020206 networking & telecommunicationsInter predictionHeterogeneous systemsHeterogeneous platformsHigh-definition videoComputer engineeringSingle viewControl and Systems EngineeringHigh definition020201 artificial intelligence & image processingDynamic load balancing algorithmsEncoderDynamic load balancing algorithmCoding (social sciences)

researchProduct

Reducing complexity in H.264/AVC motion estimation by using a GPU

2011

H.264/AVC applies a complex mode decision technique that has high computational complexity in order to reduce the temporal redundancies of video sequences. Several algorithms have been proposed in the literature in recent years with the aim of accelerating this part of the encoding process. Recently, with the emergence of many-core processors or accelerators, a new approach can be adopted for reducing the complexity of the H.264/AVC encoding algorithm. This paper focuses on reducing the inter prediction complexity adopted in H.264/AVC and proposes a GPU-based implementation using CUDA. Experimental results show that the proposed approach reduces the complexity by as much as 99% (100x of spe…

SpeedupComputational complexity theoryComputer science020206 networking & telecommunicationsData_CODINGANDINFORMATIONTHEORY02 engineering and technologyParallel computingCUDAAlgorithmic efficiency0202 electrical engineering electronic engineering information engineeringWorst-case complexity020201 artificial intelligence & image processingContext-adaptive binary arithmetic codingData compressionContext-adaptive variable-length coding

researchProduct

VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators

2015

Simulation is often used to evaluate the behaviour and measure the performance of computing systems. Specifically, in high-performance interconnection networks, the simulation has been extensively considered to verify the behaviour of the network itself and to evaluate its performance. In this context, network simulation must be fed with network traffic, also referred to as network workload, whose nature has been traditionally synthetic. These workloads can be used for the purpose of driving studies on network performance, but often such workloads are not accurate enough if a realistic evaluation is pursued. For this reason, other non-synthetic workloads have gained popularity over last dec…

InterconnectionNetwork architectureComputer scienceDistributed computingMessage passingMessage Passing InterfaceTraffic modelNetwork performanceContext (language use)Network traffic controlNetwork simulationNetwork traffic simulation2015 IEEE International Conference on Cluster Computing

researchProduct

Optimizing H.264/AVC interprediction on a GPU-based framework

2011

H.264/MPEG-4 part 10 is the latest standard for video compression and promises a significant advance in terms of quality and distortion compared with the commercial standards currently most in use such as MPEG-2 or MPEG-4. To achieve this better performance, H.264 adopts a large number of new/improved compression techniques compared with previous standards, albeit at the expense of higher computational complexity. In addition, in recent years new hardware accelerators have emerged, such as graphics processing units (GPUs), which provide a new opportunity to reduce complexity for a large variety of algorithms. However, current GPUs suffer from higher power consumption requirements because of…

Reduction (complexity)Computational Theory and MathematicsComputer Networks and CommunicationsComputer scienceDistortionMotion estimationSymmetric multiprocessor systemEnergy consumptionParallel computingSoftwareComputer Science ApplicationsTheoretical Computer ScienceData compressionConcurrency and Computation: Practice and Experience

researchProduct

Adapting hierarchical bidirectional inter prediction on a GPU-based platform for 2D and 3D H.264 video coding

2013

The H.264/AVC video coding standard introduces some improved tools in order to increase compression efficiency. Moreover, the multi-view extension of H.264/AVC, called H.264/MVC, adopts many of them. Among the new features, variable block-size motion estimation is one which contributes to high coding efficiency. Furthermore, it defines a different prediction structure that includes hierarchical bidirectional pictures, outperforming traditional Group of Pictures patterns in both scenarios: single-view and multi-view. However, these video coding techniques have high computational complexity. Several techniques have been proposed in the literature over the last few years which are aimed at acc…

Computer scienceReal-time computingGraphics processing unitComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION020206 networking & telecommunications020207 software engineering02 engineering and technologyData_CODINGANDINFORMATIONTHEORYHierarchical bidirectional picturesScalable Video CodingComputer engineeringMotion estimation0202 electrical engineering electronic engineering information engineeringH.264/AVCVideo codingEncoderContext-adaptive binary arithmetic codingGroup of picturesContext-adaptive variable-length codingEURASIP Journal on Advances in Signal Processing

researchProduct

Deadline-based QoS Algorithms for High-performance Networks

2007

Quality of service (QoS) is becoming an attractive feature for high-performance networks and parallel machines because it could allow a more efficient use of resources. Deadline-based algorithms can provide powerful QoS provision. However, the cost associated with keeping ordered lists of packets makes them impractical for high-performance networks. In this paper, we explore how to adapt efficiently the earliest deadline first family of algorithms to the high-speed networks environments. The results show excellent performance using just two virtual channels, FIFO queues, and a cost feasible with today's technology.

Earliest deadline first schedulingPacket switchingbusiness.industryNetwork packetComputer scienceQuality of serviceDistributed computingFeature (machine learning)businessAlgorithmComputer networkScheduling (computing)2007 IEEE International Parallel and Distributed Processing Symposium

researchProduct

C-switches: Increasing switch radix with current integration scale

2011

In large switch-based interconnection networks, increasing the switch radix results in a decrease in the total number of network components, and consequently the overall cost of the network can be significantly reduced. Moreover, high-radix switches are an attractive option to improve the network performance in terms of latency, since hop count is also reduced. However, there are some problems related to the integration scale to design such single-chip switches. In this paper we discuss key issues and evaluate an interesting alternative for building high-radix switches going beyond the integration scale bounds. The idea basically consists in combining several current smaller single-chip swi…

010302 applied physicsInterconnectionComputer sciencebusiness.industry02 engineering and technologyKey issues01 natural sciencesPort (computer networking)020202 computer hardware & architectureHop (networking)0103 physical sciences0202 electrical engineering electronic engineering information engineeringElectronic engineeringNetwork performanceCrossbar switchbusinessComputer network

researchProduct

Accelerating H.264 inter prediction in a GPU by using CUDA

2010

H.264/AVC defines a very efficient algorithm for the inter prediction but it takes too much time. With the emergence of General Purpose Graphics Processing Units (GPGPU), a new door has been opened to support this video algorithm into these small processing units. In this paper, a forward step is developed towards an implementation of the H.264/AVC inter prediction algorithm into a GPU using Compute Unified Device Architecture (CUDA). The results show a negligible rate distortion drop with a time reduction on average up to 93.6%.

Reduction (complexity)CUDACoprocessorComputer scienceImage processingParallel computingGeneral-purpose computing on graphics processing unitsGraphicsData compression2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE)

researchProduct

Efficient Switches with QoS Support for Clusters

2007

Current interconnect standards providing hardware support for quality of service (QoS) consider up to 16 virtual channels (VCs) for this purpose. However, most implementations do not offer so many VCs because they increase the complexity of the switch and the scheduling delays. We have shown that this number of VCs can be significantly reduced, because it is enough to use two VCs for QoS purposes at each switch port. In this paper, we cover the weaknesses of that proposal and, not only we reduce VCs, but we also improve performance due to the flexibility assigning buffer memory.

InterconnectionWeb serverJob shop schedulingbusiness.industryComputer scienceTheoryofComputation_LOGICSANDMEANINGSOFPROGRAMSQuality of serviceDistributed computingbusinesscomputer.software_genrecomputerComputer networkScheduling (computing)2007 IEEE International Parallel and Distributed Processing Symposium

researchProduct

A GPU-Based DVC to H.264/AVC Transcoder

2010

Mobile to mobile video conferencing is one of the services that the newest mobile network operators can offer to users With the apparition of the distributed video coding paradigm which moves the majority of complexity from the encoder to the decoder, this offering can be achieved by introducing a transcoder This device has to convert from the distributed video coding paradigm to traditional video coding such as H.264/AVC which is formed by simpler decoders and more complex encoders, and allows to the users to execute only the low complex algorithms In order to deal with this high complex video transcoder, this paper introduces a graphics processing unit based transcoder as base station The…

Computer architectureComputer scienceVideo trackingReal-time computingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONData_CODINGANDINFORMATIONTHEORYVideo processingMultiview Video CodingCoding tree unitEncoderContext-adaptive binary arithmetic codingScalable Video CodingVideo compression picture types

researchProduct