Search results for "Theoretical Computer Science"
showing 10 items of 1151 documents
Discovering Differential Equations from Earth Observation Data
2020
Modeling and understanding the Earth system is a constant and challenging scientific endeavour. When a clear mechanistic model is unavailable, complex or uncertain, learning from data can be an alternative. While machine learning has provided excellent methods for detection and retrieval, understanding the governing equations of the system from observational data seems an elusive problem. In this paper we introduce sparse regression to uncover a set of governing equations in the form of a system of ordinary differential equations (ODEs). The presented method is used to explicitly describe variable relations by identifying the most expressive and simplest ODEs explaining data to model releva…
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Alignment-free sequence comparison using absent words
2018
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…
Measuring the clustering effect of BWT via RLE
2017
Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…
Linear-time sequence comparison using minimal absent words & applications
2016
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…
Centrality in Complex Networks with Overlapping Community Structure
2019
AbstractIdentifying influential spreaders in networks is an essential issue in order to prevent epidemic spreading, or to accelerate information diffusion. Several centrality measures take advantage of various network topological properties to quantify the notion of influence. However, the vast majority of works ignore its community structure while it is one of the main features of many real-world networks. In a recent study, we show that the centrality of a node in a network with non-overlapping communities depends on two features: Its local influence on the nodes belonging to its community, and its global influence on the nodes belonging to the other communities. Using global and local co…
NESSie.jl – Efficient and intuitive finite element and boundary element methods for nonlocal protein electrostatics in the Julia language
2018
Abstract The development of scientific software can be generally characterized by an initial phase of rapid prototyping and the subsequent transition to computationally efficient production code. Unfortunately, most programming languages are not well-suited for both tasks at the same time, commonly resulting in a considerable extension of the development time. The cross-platform and open-source Julia language aims at closing the gap between prototype and production code by providing a usability comparable to Python or MATLAB alongside high-performance capabilities known from C and C++ in a single programming language. In this paper, we present efficient protein electrostatics computations a…
parSRA: A framework for the parallel execution of short read aligners on compute clusters
2018
The growth of next generation sequencing datasets poses as a challenge to the alignment of reads to reference genomes in terms of both accuracy and speed. In this work we present parSRA, a parallel framework to accelerate the execution of existing short read aligners on distributed-memory systems. parSRA can be used to parallelize a variety of short read alignment tools installed in the system without any modification to their source code. We show that our framework provides good scalability on a compute cluster for accelerating the popular BWA-MEM and Bowtie2 aligners. On average, it is able to accelerate sequence alignments on 16 64-core nodes (in total, 1024 cores) with speedup of 10.48 …
CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
2015
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…
Simulation-based estimation of branching models for LTR retrotransposons
2017
Abstract Motivation LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows us to take into account both the positions and the degradation level of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we have developed a first method to evaluate the parameters of this propagation …