Search results for " indexing"
showing 10 items of 88 documents
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
2012
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…
Languages with mismatches and an application to approximate indexing
2005
In this paper we describe a factorial language, denoted by L(S, k,r), that contains all words that occur in a string 5 up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h ? 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S, k, r) is a non-increasing function of r and a non-decreasing function of k and that the equation r = R(S, k, r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that rep…
Fauna Europaea: Hymenoptera - Apocrita (excl. Ichneumonoidea)
2015
Fauna Europaea provides a public web-service with an index of scientific names (including important synonyms) of all living European land and freshwater animals, their geographical distribution at country level (up to the Urals, excluding the Caucasus region), and some additional information. The Fauna Europaea project covers about 230,000 taxonomic names, including 130,000 accepted species and 14,000 accepted subspecies. This represents a huge effort by more than 400 contributing specialists throughout Europe and is a unique (standard) reference suitable for many users in science, government, industry, nature conservation and education. Hymenoptera is one of the four largest orders of inse…
PESI - a taxonomic backbone for Europe
2015
Reliable taxonomy underpins communication in all of biology, not least nature conservation and sustainable use of ecosystem resources. The flexibility of taxonomic interpretations, however, presents a serious challenge for end-users of taxonomic concepts. Users need standardised and continuously harmonised taxonomic reference systems, as well as highquality and complete taxonomic data sets, but these are generally lacking for nonspecialists. The solution is in dynamic, expertly curated web-based taxonomic tools. The Pan-European Species-directories Infrastructure (PESI) worked to solve this key issue by providing a taxonomic e-infrastructure for Europe. It strengthened the relevant social (…
Fauna Europaea: Helminths (Animal Parasitic)
2014
The Laotian Rock Rat Laonastes aenigmamus Jenkins, Kilpatrick, Robinson & Timmins, 2005 was originally discovered in Lao People's Democratic Republic in 2005. This species has been recognized as the sole surviving member of the otherwise extinct rodent family Diatomyidae. Laonastes aenigmamus was initially reported only in limestone forests of Khammouane Province, Central Lao. A second population was recently discovered in Phong Nha Ke Bang National Park (PNKB NP), Quang Binh Province, Central Vietnam in 2011. The confirmed distribution range of L. aenigmamus in Vietnam is very small, approximately 150 km , covering low karst mountains in five communes of Minh Hoa District, Quang Binh Provi…
Fauna Europaea: Diptera – Brachycera
2015
Fauna Europaea provides a public web-service with an index of scientific names (including important synonyms) of all extant multicellular European terrestrial and freshwater animals and their geographical distribution at the level of countries and major islands (east of the Urals and excluding the Caucasus region). The Fauna Europaea project comprises about 230,000 taxonomic names, including 130,000 accepted species and 14,000 accepted subspecies, which is much more than the originally projected number of 100,000 species. Fauna Europaea represents a huge effort by more than 400 contributing taxonomic specialists throughout Europe and is a unique (standard) reference suitable for many user c…
Sorted deduplication: How to process thousands of backup streams
2016
The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…
A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection
2014
Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1 In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in …
Hardware implementation of content based video indexing algorithms
2005
This paper focus on hardware implementation of content based video indexing techniques by using the FPGA technology. We aim to propose hardware modules that can satisfy requirements of constrained applications, such as real time applications and complex applications that can combine a large number of techniques in the same indexing system. We represent tow examples of micro-architectures related to the dominant colors descriptor and the compact color descriptor.
Quality of Service Management on Multimedia Data Transformation into Serial Stories Using Movement Oriented Method
2011
Multimedia data transformation into serial stories or story board will help to reduce the consumption of storage media, indexing, sorting and searching system. Movement Oriented Method that is being developed changes the form of multimedia data into serial stories. Movement Oriented Method depends on the knowledge each actor who uses it. Different knowledge of each actor in the transformation process raises complex issues, such as the sequence, and the resulted story object that could become the standard. And the most fatal could be, the resulted stories does not same with the original multimedia data. To solve it, the Standard Level Knowledge (SLK) in maintaining the quality of the story c…