Search results for "algorithm."
showing 10 items of 4617 documents
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
2021
Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …
On Big Data: How should we make sense of them?
2020
The topic of Big Data is today extensively discussed, not only on the technical ground. This also depends on the fact that Big Data are frequently presented as allowing an epistemological paradigm shift in scientific research, which would be able to supersede the traditional hypothesis-driven method. In this piece, I critically scrutinize two key claims that are usually associated with this approach, namely, the fact that data speak for themselves, deflating the role of theories and models, and the primacy of correlation over causation. In so doing, I will also refer to a recent case history of data mining projects in the field of biomedicine, i.e. EXPOsOMICS. My intention is both to acknow…
2013
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…
Fast Algorithms for Pseudoarboricity
2015
The densest subgraph problem, which asks for a subgraph with the maximum edges-to-vertices ratio d∗, is solvable in polynomial time. We discuss algorithms for this problem and the computation of a graph orientation with the lowest maximum indegree, which is equal to ⌈d∗⌉. This value also equals the pseudoarboricity of the graph. We show that it can be computed in O(|E| √ log log d∗) time, and that better estimates can be given for graph classes where d∗ satisfies certain asymptotic bounds. These runtimes are achieved by accelerating a binary search with an approximation scheme, and a runtime analysis of Dinitz’s algorithm on flow networks where all arcs, except the source and sink arcs, hav…
A new compact formulation for the discrete p-dispersion problem
2017
Abstract This paper addresses the discrete p -dispersion problem (PDP) which is about selecting p facilities from a given set of candidates in such a way that the minimum distance between selected facilities is maximized. We propose a new compact formulation for this problem. In addition, we discuss two simple enhancements of the new formulation: Simple bounds on the optimal distance can be exploited to reduce the size and to increase the tightness of the model at a relatively low cost of additional computation time. Moreover, the new formulation can be further strengthened by adding valid inequalities. We present a computational study carried out over a set of large-scale test instances i…
Optimal standalone data center renewable power supply using an offline optimization approach
2022
Abstract Because of the increasing energy consumption of data centers and their C O 2 emissions, the ANR DATAZERO2 project aims to design autonomous data centers running solely on local renewable energy coupled with storage devices to overcome the intermittency issue. In order to optimize the use of renewable energy and storage devices, a MILP solver is usually in charge of assigning the power to be supplied to the data center. However, in order to reduce the computation time and make the approach scalable, it would be more appropriate to use a polynomial time algorithm. This paper aims at showing and proving that it is possible to provide an optimal power profile via a deterministic algori…
Structural difficulty in grammatical evolution versus genetic programming
2013
Genetic programming (GP) has problems with structural difficulty as it is unable to search effectively for solutions requiring very full or very narrow trees. As a result of structural difficulty, GP has a bias towards narrow trees which means it searches effectively for solutions requiring narrow trees. This paper focuses on the structural difficulty of grammatical evolution (GE). In contrast to GP, GE works on variable-length binary strings and uses a grammar in Backus-Naur Form (BNF) to map linear genotypes to phenotype trees. The paper studies whether and how GE is affected by structural difficulty. For the analysis, we perform random walks through the search space and compare the struc…
Efficient lower and upper bounds of the diagonal-flip distance between triangulations
2006
There remains today an open problem whether the rotation distance between binary trees or equivalently the diagonal-flip distance between triangulations can be computed in polynomial time. We present an efficient algorithm for computing lower and upper bounds of this distance between a pair of triangulations.
An efficient upper bound of the rotation distance of binary trees
2000
A polynomial time algorithm is developed for computing an upper bound for the rotation distance of binary trees and equivalently for the diagonal-flip distance of convex polygons triangulations. Ordinal tools are used.
On the Locality of Standard Search Operators in Grammatical Evolution
2014
Offspring should be similar to their parents and inherit their relevant properties. This general design principle of search operators in evolutionary algorithms is either known as locality or geometry of search operators, respectively. It takes a geometric perspective on search operators and suggests that the distance between an offspring and its parents should be less than or equal to the distance between both parents. This paper examines the locality of standard search operators used in grammatical evolution (GE) and genetic programming (GP) for binary tree problems. Both standard GE and GP search operators suffer from low locality since a substantial number of search steps result in an o…