A parallel and sensitive software tool for methylation analysis on multicore platforms.
Abstract Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bis…
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge.In this paper, uLBDR (Universal Logic-Based Distributed Routing) is proposed as an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alter…
Providing Full Awareness to Distributed Virtual Environments Based on Peer-to-Peer Architectures
In recent years, large scale distributed virtual environments (DVEs) have become a major trend in distributed applications, mainly due to the enormous popularity of multiplayer online games in the entertainment industry. Since architectures based on networked servers seems to be not scalable enough to support massively multiplayer applications, peer-to-peer (P2P) architectures have been proposed as an efficient and truly scalable solution for this kind of systems. However, the main challenge of P2P architectures consists of providing each avatar with updated information about which other avatars are its neighbors. We have denoted this problem as the awareness problem. Although some proposal…
Network Reconfiguration Suitability for Scientific Applications
This paper analyzes the communication pattern of several scientific applications and how they can make profit of network reconfiguration in order to adapt network topology to the communication needs so that total execution time is reduced. By using an analysis methodology based on real application executions, we study the variation of the required communication bandwidth with time and also the global interprocedural communication patterns. Results show that required bandwidth between each pair of processes does not significantly fluctuates, leading to a constant use of the links and therefore discouraging dynamic reconfigurations of the network during execution time. Nevertheless, the group…
Cost-Effective Congestion Management for Interconnection Networks Using Distributed Deterministic Routing
The Interconnection networks are essential elements in current computing systems. For this reason, achieving the best network performance, even in congestion situations, has been a primary goal in recent years. In that sense, there exist several techniques focused on eliminating the main negative effect of congestion: the Head of Line (HOL) blocking. One of the most successful HOL blocking elimination techniques is RECN, which can be applied in source routing networks. FBICM follows the same approach as RECN, but it has been developed for distributed deterministic routing networks. Although FBICM effectively eliminates HOL blocking, it requires too much resources to be implemented. In this …
On the impact of within-die process variation in GALS-Based NoC Performance
[EN] Current integration scales allow designing chip multiprocessors (CMP), where cores are interconnected by means of a network-on-chip (NoC). Unfortunately, the small feature size of current integration scales causes some unpredictability in manufactured devices because of process variation. In NoCs, variability may affect links and routers causing them not to match the parameters established at design time. In this paper, we first analyze the way that manufacturing deviations affect the components of a NoC by applying a new comprehensive and detailed within-die variability model to 200 instances of an 8¿8 mesh NoC synthesized using 45 nm technology. Later, we show that GALS-based NoCs pr…
Evaluation of an Alternative for Increasing Switch Radix
In large switch-based interconnection networks, increasing the switch radix results in a decrease in the total number of network components. In this paper we evaluate an interesting strategy for building high-radix switches going beyond the integration scale bounds. This approach is independent of the evolution of single-chip switches and will remain valid as integration scale keeps evolving. Simulation results show that with a correct internal switch design, this kind of switches achieves almost the same performance as single-chip switches with the same radix, which would be unfeasible with current integration scale.
Combining congested-flow isolation and injection throttling in HPC interconnection networks
Existing congestion control mechanisms in interconnects can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. These two approaches have different, but non-overlapping weaknesses. In this paper we present in detail a method that combines injection throttling and congested-flow isolation. Through simulation studies we first demonstrate the respective flaws of the injection throttling and of flow isolation. Thereafter we show that our combined method extracts the best of both approaches in the sense that it gives fast reaction to congesti…
LSOM: A Link State protocol Over MAC addresses for metropolitan backbones using Optical Ethernet switches
This paper presents a new protocol named "Link State Over MAC" (LSOM) for Optical Ethernet switches to allow the use of active loop topologies, like meshes, in Metropolitan Area Networks (MAN) or even Wide Area Networks (WAN) backbone. In this respect, LSOM is an alternative to a ring topology as proposed in draft IEEE 802.17 Resilient Packet Ring (RPR) or a tree topology using IEEE802. 1D Rapid Spanning Tree Protocol (RSTP). LSOM provides higher scalability and is able to achieve better bandwidth utilization and lower latency than RSTP and RPR. Simulation results for 4-node and 9-node topologies show that LSOM can improve throughput over RPR by a factor of up to 1.7. Furthermore, full free…
On the Characterization of Distributed Virtual Environment Systems
Distributed Virtual Environment systems have experienced a spectacular growth last years. One of the key issues in the design of scalable and cost-effective DVE systems is the partitioning problem. This problem consists of efficiently assigning clients (3-D avatars) to the servers in the system, and some techniques have been already proposed for solving it.
Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems
[EN] The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area, and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism, or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge. This paper presents universal logic-based distributed routing (uLBDR), an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of usi…
A genetic approach for adding QoS to distributed virtual environments
Distributed virtual environment (DVE) systems have been designed last years as a set of distributed servers. These systems allow a large number of remote users to share a single 3D virtual scene. In order to provide quality of service in a DVE system, clients should be properly assigned to servers taking into account system throughput and system latency. The latter one is composed of both network and computational delays. This highly complex problem is known as the quality of service (QoS) problem. In this paper, we study the implementation of a genetic algorithm (GA) for solving the QoS problem in DVE systems. Performance evaluation results show that, due to its ability of both finding goo…
Optimal Configuration for N-Dimensional Twin Torus Networks
Torus topology is one of the most common topologies used in the current largest supercomputers. Although 3D torus is widely used, recently some supercomputers in the Top500 list have been built using networks with topologies of five or six dimensions. To obtain an nD torus, 2n ports per node are needed. These ports can be offered by a single or several cards per node. In the second case, there are multiple ways of assigning the dimension and direction of the card ports. In a previous work we proposed the 3D Twin (3DT) torus which uses two 4-port cards per node, and obtained the optimal port configuration. This paper extends and generalizes that work in order to obtain the optimal port confi…
A Comparison Study of Metaheuristic Techniques for Providing QoS to Avatars in DVE Systems
Network-server architecture has become a de-facto standard for Distributed Virtual Environment (DVE) systems. In these systems, a large set of remote users share a 3D virtual scene. In order to design scalable DVE systems, different approaches have been proposed to maintain the DVE system working under its saturation point, maximizing system throughput. Also, in order to provide quality of service to avatars in a DVE systems, avatars should be assigned to servers taking into account, among other factors, system throughput and system latency. This highly complex problem is called quality of service (QoS) problem in DVE systems. This paper proposes two different approaches for solving the QoS…
M-GRASP: A GRASP With Memory for Latency-Aware Partitioning Methods in DVE Systems
A necessary condition for providing quality of service to distributed virtual environments (DVEs) is to provide a system response below a maximum threshold to the client computers. In this sense, latency-aware partitioning methods try to provide response times below the threshold to the maximum number of client computers as possible. These partitioning methods should find an assignment of clients to servers that optimizes system throughput, system latency, and partitioning efficiency. In this paper, we present a new algorithm based on greedy randomized adaptive search procedure with memory for finding the best solutions as possible to this problem. We take into account several different alt…
Ensuring the performance and scalability of peer-to-peer distributed virtual environments
Large scale distributed virtual environments (DVEs) have become a major trend in distributed applications. Peer-to-peer (P2P) architectures have been proposed as an efficient and truly scalable solution for these kinds of systems. However, in order to design efficient P2P DVEs these systems must be characterized, measuring the impact of different client behavior on system performance. This paper presents the experimental characterization of P2P DVEs. The results show that the saturation of a given client has an exclusive effect on the surrounding clients in the virtual world, having no noticeable effect at all on the rest of clients. Nevertheless, the interactions among clients that can tak…
On the development of a communication-aware task mapping technique
Clusters have become a very cost-effective platform for high-performance computing. In these systems, although currently existing networks actually provide enough bandwidth for the existing applications and workstations, the trend is towards the interconnection network becoming the system bottleneck. Therefore, in the future, scheduling strategies will have to take into account the communication requirements of the applications and the communication bandwidth that the network can offer. One of the key issues in these strategies is the task mapping technique used when the network becomes the system bottleneck.In this paper, we propose a communication-aware mapping technique that tries to mat…
Deadline-based QoS Algorithms for High-performance Networks
Quality of service (QoS) is becoming an attractive feature for high-performance networks and parallel machines because it could allow a more efficient use of resources. Deadline-based algorithms can provide powerful QoS provision. However, the cost associated with keeping ordered lists of packets makes them impractical for high-performance networks. In this paper, we explore how to adapt efficiently the earliest deadline first family of algorithms to the high-speed networks environments. The results show excellent performance using just two virtual channels, FIFO queues, and a cost feasible with today's technology.
Design of an ICT Tool for Decision Making in Social and Health Policies
The governance requires technical support regarding the complexity in deciding health policies to assist people who require long-term care. Long-term care policies require the use of ICT simulation tools that can provide policy makers with the option of going into a decision theatre and virtually knowing the consequences of different policies prior to finally determining the real policy to be adopted. In this sense, there is an absence of simulation tools for decision making about long-term care policies. In this chapter, the authors propose the foundations and guidelines of SSIMSOWELL, a new scalable, multiagent simulation tool that increases the prediction capacity of governance in the lo…
A New Scalable and Cost-Effective Congestion Management Strategy for Lossless Multistage Interconnection Networks
In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase. Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HOL blocking produced by congestion trees. This is achieved in a scalable manner by using separate queues for congested flows. These are dynamically allocated only when congestion arises, and deallocated when congestion subsides. Performance evaluation results show that our strategy responds to congestion immediately and completely eliminates the performance degradation produced by HOL blocking whi…
A Communication-Aware Topological Mapping Technique for NoCs
Networks---on---Chip (NoCs) have been proposed as a promising solution to the complex on-chip communication problems derived from the increasing number of processor cores. The design of NoCs involves several key issues, being the topological mapping (the mapping of the Intellectual Properties (IPs) to network nodes) one of them. Several proposals have been focused on topological mapping last years, but they require the experimental validation of each mapping considered. In this paper, we propose a communication-aware topological mapping technique for NoCs. This technique is based on the experimental correlation of the network model with the actual network performance, thus avoiding the need…
SUBOPTIMAL-OPTIMAL ROUTING FOR LAN INTERNETWORKING USING TRANSPARENT BRIDGES
The current standard transparent bridge protocol IEEE-802.1D is based on the Spanning Tree (ST) algorithm. It has a very important restriction: it cannot work when the topology has active loops. Therefore, a tree is the only possible interconnection topology that can be used. The ST algorithm guarantees that the active topology is a tree discarding lines that form loops. However, because of this, network bandwidth cannot be fully utilized. Moreover, trees have a very serious bottleneck near the root. This paper proposes a new transparent bridge protocol for LAN interconnection that allows active loops. Therefore, strongly connected regular topologies like tori, hypercubes, meshes, etc., as…
C-switches: Increasing switch radix with current integration scale
In large switch-based interconnection networks, increasing the switch radix results in a decrease in the total number of network components, and consequently the overall cost of the network can be significantly reduced. Moreover, high-radix switches are an attractive option to improve the network performance in terms of latency, since hop count is also reduced. However, there are some problems related to the integration scale to design such single-chip switches. In this paper we discuss key issues and evaluate an interesting alternative for building high-radix switches going beyond the integration scale bounds. The idea basically consists in combining several current smaller single-chip swi…
On the potential of NoC virtualization for multicore chips
As the end of Moores-law is on the horizon, power becomes a limiting factor to continuous increases in performance gains for single-core processors. Processor engineers have shifted to the multicore paradigm and many-core processors are a reality. Within the context of these multi-core chips, three key metrics point themselves out as being of major importance, performance, fault-tolerance (including yield), and power consumption. A solution that optimizes all three of these metrics is challenging. As the number of cores increases the importance of the interconnection network-on-chip (NoC) grows as well, and chip designers should aim to optimize these three key metrics in the NoC context as …
Efficient Switches with QoS Support for Clusters
Current interconnect standards providing hardware support for quality of service (QoS) consider up to 16 virtual channels (VCs) for this purpose. However, most implementations do not offer so many VCs because they increase the complexity of the switch and the scheduling delays. We have shown that this number of VCs can be significantly reduced, because it is enough to use two VCs for QoS purposes at each switch port. In this paper, we cover the weaknesses of that proposal and, not only we reduce VCs, but we also improve performance due to the flexibility assigning buffer memory.
Logic-Based Distributed Routing for NoCs
The design of scalable and reliable interconnection networks for multicore chips (NoCs) introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are usually proposed for NoCs, heterogeneous cores, manufacturing defects, hard failures, and chip virtualization may lead to irregular topologies. In this context, efficient routing becomes a challenge. Although switches can be easily configured to support most routing algorithms and topologies by using routing tables, this solution does not scale in terms of latency and area. We propose a new circuit that removes the need for using routing tables. The new mechanism, referred to as logic-based dis…
Accurate reliability and availability models for direct interconnection networks
Fault tolerance in multicomputer interconnection networks has been traditionally studied by determining the worst possible combination of faulty components that causes its failure and then assuming that this will occur. But, the probability of the worst possible combination is usually low, and the routing algorithm may be able to find a route between source and destination nodes. The network dependability parameters computed according to this approach will be underestimated. In this paper we propose a methodology for accurately evaluating interconnection network dependability. In addition, we apply it to obtain an accurate estimation of the reliability and availability parameters in a 2-D m…
An Efficient Implementation of Distributed Routing Algorithms for NoCs
The design of NoCs for multi-core chips introduces new design constraints like power consumption, area, and ultra low latencies. Although 2D meshes are preferred, heterogeneous blocks, fabrication faults, reliability issues, and chip virtualization may lead to the need of irregular topologies or regions. In this situation, efficient routing becomes a challenge. Although the use of routing tables at switches is flexible, it does not scale in terms of latency and area due to its memory requirements. LBDR (logic-based distributed routing) is proposed as a new routing method that removes the need of using routing tables at all. LBDR enables the implementation of many routing algorithms on most …