Search results for "algorithm."

showing 10 items of 4617 documents

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics

researchProduct

On Big Data: How should we make sense of them?

2020

The topic of Big Data is today extensively discussed, not only on the technical ground. This also depends on the fact that Big Data are frequently presented as allowing an epistemological paradigm shift in scientific research, which would be able to supersede the traditional hypothesis-driven method. In this piece, I critically scrutinize two key claims that are usually associated with this approach, namely, the fact that data speak for themselves, deflating the role of theories and models, and the primacy of correlation over causation. In so doing, I will also refer to a recent case history of data mining projects in the field of biomedicine, i.e. EXPOsOMICS. My intention is both to acknow…

Big DataValue (ethics)causalityMultidisciplinarydata-driven scienceComputer sciencebusiness.industryBig dataepistemologyopacity of algorithm.Data scienceend of theoryHistory and Philosophy of ScienceParadigm shiftKey (cryptography)CausationHeuristicsbusinessMètode Revista de difusió de la investigació

researchProduct

2013

Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…

Binary responseSample size determinationStatisticsExpectation–maximization algorithmEconometricsMain effectImputation (statistics)Missing dataInteractionLogistic regressionMathematicsOpen Journal of Statistics

researchProduct

Fast Algorithms for Pseudoarboricity

2015

The densest subgraph problem, which asks for a subgraph with the maximum edges-to-vertices ratio d∗, is solvable in polynomial time. We discuss algorithms for this problem and the computation of a graph orientation with the lowest maximum indegree, which is equal to ⌈d∗⌉. This value also equals the pseudoarboricity of the graph. We show that it can be computed in O(|E| √ log log d∗) time, and that better estimates can be given for graph classes where d∗ satisfies certain asymptotic bounds. These runtimes are achieved by accelerating a binary search with an approximation scheme, and a runtime analysis of Dinitz’s algorithm on flow networks where all arcs, except the source and sink arcs, hav…

Binary search algorithmComputation0102 computer and information sciences02 engineering and technologyOrientation (graph theory)01 natural sciencesFlow (mathematics)010201 computation theory & mathematicsLog-log plotTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY0202 electrical engineering electronic engineering information engineeringGraph (abstract data type)020201 artificial intelligence & image processingUnit (ring theory)AlgorithmTime complexityMathematicsofComputing_DISCRETEMATHEMATICSMathematics2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX)

researchProduct

A new compact formulation for the discrete p-dispersion problem

2017

Abstract This paper addresses the discrete p -dispersion problem (PDP) which is about selecting p facilities from a given set of candidates in such a way that the minimum distance between selected facilities is maximized. We propose a new compact formulation for this problem. In addition, we discuss two simple enhancements of the new formulation: Simple bounds on the optimal distance can be exploited to reduce the size and to increase the tightness of the model at a relatively low cost of additional computation time. Moreover, the new formulation can be further strengthened by adding valid inequalities. We present a computational study carried out over a set of large-scale test instances i…

Binary search algorithmMathematical optimization021103 operations researchInformation Systems and ManagementLine searchGeneral Computer Science0211 other engineering and technologies0102 computer and information sciences02 engineering and technologyManagement Science and Operations ResearchSolver01 natural sciencesIndustrial and Manufacturing EngineeringFacility location problemSet (abstract data type)010201 computation theory & mathematicsModeling and SimulationProgramming paradigmInteger programmingAlgorithmStandard model (cryptography)MathematicsEuropean Journal of Operational Research

researchProduct

Optimal standalone data center renewable power supply using an offline optimization approach

2022

Abstract Because of the increasing energy consumption of data centers and their C O 2 emissions, the ANR DATAZERO2 project aims to design autonomous data centers running solely on local renewable energy coupled with storage devices to overcome the intermittency issue. In order to optimize the use of renewable energy and storage devices, a MILP solver is usually in charge of assigning the power to be supplied to the data center. However, in order to reduce the computation time and make the approach scalable, it would be more appropriate to use a polynomial time algorithm. This paper aims at showing and proving that it is possible to provide an optimal power profile via a deterministic algori…

Binary search algorithmMathematical optimizationGeneral Computer Sciencebusiness.industryDeterministic algorithmComputer scienceEnergy consumptionSolverRenewable energyScalabilityData centerElectrical and Electronic EngineeringbusinessTime complexitySustainable Computing: Informatics and Systems

researchProduct

Structural difficulty in grammatical evolution versus genetic programming

2013

Genetic programming (GP) has problems with structural difficulty as it is unable to search effectively for solutions requiring very full or very narrow trees. As a result of structural difficulty, GP has a bias towards narrow trees which means it searches effectively for solutions requiring narrow trees. This paper focuses on the structural difficulty of grammatical evolution (GE). In contrast to GP, GE works on variable-length binary strings and uses a grammar in Backus-Naur Form (BNF) to map linear genotypes to phenotype trees. The paper studies whether and how GE is affected by structural difficulty. For the analysis, we perform random walks through the search space and compare the struc…

Binary treeGrammarGrammatical evolutionmedia_common.quotation_subjectStructure (category theory)Contrast (statistics)Genetic programmingRepresentation (mathematics)Random walkAlgorithmmedia_commonMathematicsProceedings of the 15th annual conference on Genetic and evolutionary computation

researchProduct

Efficient lower and upper bounds of the diagonal-flip distance between triangulations

2006

There remains today an open problem whether the rotation distance between binary trees or equivalently the diagonal-flip distance between triangulations can be computed in polynomial time. We present an efficient algorithm for computing lower and upper bounds of this distance between a pair of triangulations.

Binary treeOpen problem010102 general mathematicsDiagonalApproximation algorithmTriangulation (social science)0102 computer and information sciences01 natural sciencesUpper and lower boundsComputer Science ApplicationsTheoretical Computer ScienceCombinatorics010201 computation theory & mathematicsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYSignal Processing[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]0101 mathematicsRotation (mathematics)Time complexityComputingMilieux_MISCELLANEOUSInformation SystemsMathematics

researchProduct

An efficient upper bound of the rotation distance of binary trees

2000

A polynomial time algorithm is developed for computing an upper bound for the rotation distance of binary trees and equivalently for the diagonal-flip distance of convex polygons triangulations. Ordinal tools are used.

Binary treeRegular polygonComputer Science::Computational GeometryUpper and lower boundsComputer Science ApplicationsTheoretical Computer ScienceCombinatoricsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYLattice (order)Signal ProcessingTime complexityComputingMethodologies_COMPUTERGRAPHICSInformation SystemsMathematicsInformation Processing Letters

researchProduct

On the Locality of Standard Search Operators in Grammatical Evolution

2014

Offspring should be similar to their parents and inherit their relevant properties. This general design principle of search operators in evolutionary algorithms is either known as locality or geometry of search operators, respectively. It takes a geometric perspective on search operators and suggests that the distance between an offspring and its parents should be less than or equal to the distance between both parents. This paper examines the locality of standard search operators used in grammatical evolution (GE) and genetic programming (GP) for binary tree problems. Both standard GE and GP search operators suffer from low locality since a substantial number of search steps result in an o…

Binary treeTheoretical computer sciencebusiness.industryPerspective (graphical)LocalityEvolutionary algorithmGenetic programmingcomputer.software_genreRandom walkGrammatical evolutionArtificial intelligencebusinesscomputerNatural language processingMathematics

researchProduct