Search results for "Search engine indexing"
showing 10 items of 56 documents
The indexing of persons in news sequences using audio-visual data
2004
We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.
Reverse-Safe Text Indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…
Suitability of a content-based retrieval method in astronomical image databases
1996
Abstract Indexing and retrieval methods based on the image content are required to effectively use information from large repositories of digital images. Usually, the way to search for data and images in astronomical archives is via textual queries expressed in terms of constraints on observation parameters. In this paper we present a method for automatic extraction of images by using shape descriptions based on local symmetry. The proposed indexing methodology has been developed and tested inside JACOB, a prototypal system for content-based video database querying.
Novel Results on the Number of Runs of the Burrows-Wheeler-Transform
2021
The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
2012
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…
Sorted deduplication: How to process thousands of backup streams
2016
The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…
A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection
2014
Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1 In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in …
Hardware implementation of content based video indexing algorithms
2005
This paper focus on hardware implementation of content based video indexing techniques by using the FPGA technology. We aim to propose hardware modules that can satisfy requirements of constrained applications, such as real time applications and complex applications that can combine a large number of techniques in the same indexing system. We represent tow examples of micro-architectures related to the dominant colors descriptor and the compact color descriptor.
Quality of Service Management on Multimedia Data Transformation into Serial Stories Using Movement Oriented Method
2011
Multimedia data transformation into serial stories or story board will help to reduce the consumption of storage media, indexing, sorting and searching system. Movement Oriented Method that is being developed changes the form of multimedia data into serial stories. Movement Oriented Method depends on the knowledge each actor who uses it. Different knowledge of each actor in the transformation process raises complex issues, such as the sequence, and the resulted story object that could become the standard. And the most fatal could be, the resulted stories does not same with the original multimedia data. To solve it, the Standard Level Knowledge (SLK) in maintaining the quality of the story c…
Comparing DNA sequence collections by direct comparison of compressed text indexes
2012
Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have h…