Search results for "data structures"
showing 10 items of 258 documents
Stability-Based Model Selection for High Throughput Genomic Data: An Algorithmic Paradigm
2012
Clustering is one of the most well known activities in scien- tific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is the model selection problem, i.e., the identifi- cation of the correct number of clusters in a dataset. In the last decade, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained promi- nence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of predic- tion, but the slowest in terms of time. Unfortunately…
The expressive power of the shuffle product
2010
International audience; There is an increasing interest in the shuffle product on formal languages, mainly because it is a standard tool for modeling process algebras. It still remains a mysterious operation on regular languages.Antonio Restivo proposed as a challenge to characterize the smallest class of languages containing the singletons and closed under Boolean operations, product and shuffle. This problem is still widely open, but we present some partial results on it. We also study some other smaller classes, including the smallest class containing the languages composed of a single word of length 2 which is closed under Boolean operations and shuffle by a letter (resp. shuffle by a l…
Linear-size suffix tries
2016
Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ ( n 2 ) nodes and lin…
Table Compression
2016
Data Compression Techniques for massive tables are described. Related methodological results are also presented.
Fully Automatic Trunk Packing with Free Placements
2010
We present a new algorithm to compute the volume of a trunk according to the SAE J1100 standard. Our new algorithm uses state-of-the-art methods from computational geometry and from combinatorial optimization. It finds better solutions than previous approaches for small trunks.
Continuous reformulations and heuristics for the Euclidean travelling salesperson problem
2008
We consider continuous reformulations of the Euclidean travelling salesperson problem (TSP), based on certain clustering problem formulations. These reformulations allow us to apply a generalisation with perturbations of the Weiszfeld algorithm in an attempt to find local approximate solutions to the Euclidean TSP.
Indexing a sequence for mapping reads with a single mismatch
2014
Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…
FRIPON: a worldwide network to track incoming meteoroids
2020
Context. Until recently, camera networks designed for monitoring fireballs worldwide were not fully automated, implying that in case of a meteorite fall, the recovery campaign was rarely immediate. This was an important limiting factor as the most fragile - hence precious - meteorites must be recovered rapidly to avoid their alteration. Aims. The Fireball Recovery and InterPlanetary Observation Network (FRIPON) scientific project was designed to overcome this limitation. This network comprises a fully automated camera and radio network deployed over a significant fraction of western Europe and a small fraction of Canada. As of today, it consists of 150 cameras and 25 European radio receiver…
Reverse-Safe Text Indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…
Longest Motifs with a Functionally Equivalent Central Block
2004
International audience; This paper presents a generalization of the notion of longest repeats with a block of k don't care symbols introduced by [Crochemore et al., LATIN 2004] (for k fixed) to longest motifs composed of three parts: a first and last that parameterize match (that is, match via some symbol renaming, initially unknown), and a functionally equivalent central block. Such three-part motifs are called longest block motifs. Different types of functional equivalence, and thus of matching criteria for the central block are considered, which include as a subcase the one treated in [Crochemore et al., LATIN 2004] and extend to the case of regular expressions with no Kleene closure or …