Search results for "Computation"
showing 10 items of 7362 documents
Big Data in metagenomics: Apache Spark vs MPI.
2020
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when…
A full conformational space analysis of bilirubin
2009
Ab initio methods were utilized in a gas-phase systematic conformational search of bilirubin conformers. The whole molecule was divided into four fragments. Most stable conformers of them were employed to build 196 conformers of the complete bilirubin molecule. Initial geometries were optimized using HF/3-21G level of theory and the minimum energy conformers were then reoptimized at B3LYP/6-31G(d) level. Ridge-tile conformer was the most stable one, in perfect agreement with X-ray data. We found that while tetrapyrrole backbone shows some flexibility, propionic acid side chains have a greater influence in bilirubin conformation because they can interact through different hydrogen bond patte…
Computation of the area in the discrete plane: Green’s theorem revisited
2017
International audience; The detection of the contour of a binary object is a common problem; however, the area of a region, and its moments, can be a significant parameter. In several metrology applications, the area of planar objects must be measured. The area is obtained by counting the pixels inside the contour or using a discrete version of Green's formula. Unfortunately, we obtain the area enclosed by the polygonal line passing through the centers of the pixels along the contour. We present a modified version of Green's theorem in the discrete plane, which allows for the computation of the exact area of a two-dimensional region in the class of polyominoes. Penalties are introduced and …
Fast Algorithms for Pseudoarboricity
2015
The densest subgraph problem, which asks for a subgraph with the maximum edges-to-vertices ratio d∗, is solvable in polynomial time. We discuss algorithms for this problem and the computation of a graph orientation with the lowest maximum indegree, which is equal to ⌈d∗⌉. This value also equals the pseudoarboricity of the graph. We show that it can be computed in O(|E| √ log log d∗) time, and that better estimates can be given for graph classes where d∗ satisfies certain asymptotic bounds. These runtimes are achieved by accelerating a binary search with an approximation scheme, and a runtime analysis of Dinitz’s algorithm on flow networks where all arcs, except the source and sink arcs, hav…
A new compact formulation for the discrete p-dispersion problem
2017
Abstract This paper addresses the discrete p -dispersion problem (PDP) which is about selecting p facilities from a given set of candidates in such a way that the minimum distance between selected facilities is maximized. We propose a new compact formulation for this problem. In addition, we discuss two simple enhancements of the new formulation: Simple bounds on the optimal distance can be exploited to reduce the size and to increase the tightness of the model at a relatively low cost of additional computation time. Moreover, the new formulation can be further strengthened by adding valid inequalities. We present a computational study carried out over a set of large-scale test instances i…
On the Non-uniform Redundancy in Grammatical Evolution
2016
This paper investigates the redundancy of representation in grammatical evolution (GE) for binary trees. We analyze the entire GE solution space by creating all binary genotypes of predefined length and map them to phenotype trees, which are then characterized by their size, depth and shape. We find that the GE representation is strongly non-uniformly redundant. There are huge differences in the number of genotypes that encode one particular phenotype. Thus, it is difficult for GE to solve problems where the optimal tree solutions are underrepresented. In general, the GE mapping process is biased towards short tree structures, which implies high GE performance if the optimal solution requir…
Efficient lower and upper bounds of the diagonal-flip distance between triangulations
2006
There remains today an open problem whether the rotation distance between binary trees or equivalently the diagonal-flip distance between triangulations can be computed in polynomial time. We present an efficient algorithm for computing lower and upper bounds of this distance between a pair of triangulations.
An efficient upper bound of the rotation distance of binary trees
2000
A polynomial time algorithm is developed for computing an upper bound for the rotation distance of binary trees and equivalently for the diagonal-flip distance of convex polygons triangulations. Ordinal tools are used.
The Myriad Virtues of Wavelet Trees
2009
Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like,…
Bayesian hierarchical models for analysing the spatial distribution of bioclimatic indices
2017
A methodological approach for modelling the spatial distribution of bioclimatic indices is proposed in this paper. The value of the bioclimatic index is modelled with a hierarchical Bayesian model that incorporates both structured and unstructured random effects. Selection of prior distributions is also discussed in order to better incorporate any possible prior knowledge about the parameters that could refer to the particular characteristics of bioclimatic indices. MCMC methods and distributed programming are used to obtain an approximation of the posterior distribution of the parameters and also the posterior predictive distribution of the indices. One main outcome of the proposal is the …