Search results for "Search engine indexing"

showing 10 items of 56 documents

The indexing of persons in news sequences using audio-visual data

2004

We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.

Contextual image classificationComputer scienceSpeech recognitionSearch engine indexingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONSelection (linguistics)Speaker recognitionAudio signal processingcomputer.software_genrecomputerFacial recognition systemElectronic mail2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

researchProduct

Reverse-Safe Text Indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…

Data structuresComputer scienceSuffix treesuffix tree0102 computer and information sciences02 engineering and technologytext indexing01 natural sciencesTheoretical Computer Sciencelaw.inventionSet (abstract data type)law020204 information systems0202 electrical engineering electronic engineering information engineeringPattern matchingdata privacySettore INF/01 - InformaticaSearch engine indexingdata privacy; Data structures; pattern matching; suffix tree; text indexingData structureMatrix multiplicationpattern matching010201 computation theory & mathematicsData structureAlgorithmAdversary modelInteger (computer science)ACM Journal of Experimental Algorithmics

researchProduct

Suitability of a content-based retrieval method in astronomical image databases

1996

Abstract Indexing and retrieval methods based on the image content are required to effectively use information from large repositories of digital images. Usually, the way to search for data and images in astronomical archives is via textual queries expressed in terms of constraints on observation parameters. In this paper we present a method for automatic extraction of images by using shape descriptions based on local symmetry. The proposed indexing methodology has been developed and tested inside JACOB, a prototypal system for content-based video database querying.

Digital imageInformation retrievalAutomatic image annotationComputer scienceContent (measure theory)Search engine indexingAstronomy and AstrophysicsVisual WordImage retrievalImage (mathematics)Content based retrievalVistas in Astronomy

researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression

researchProduct

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

2012

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…

FOS: Computer and information sciencesStatistics and ProbabilityBurrows–Wheeler transformComputer scienceData_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformcomputer.software_genreBiochemistryBurrows-Wheeler transform; Data Compression; Next-generation sequencingComputer Science - Data Structures and AlgorithmsEscherichia coliCode (cryptography)HumansOverhead (computing)Data Structures and Algorithms (cs.DS)Computer SimulationQuantitative Biology - GenomicsMolecular BiologyGenomics (q-bio.GN)Genome HumanString (computer science)Search engine indexingSortingGenomicsSequence Analysis DNAConstruct (python library)Data CompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesNext-generation sequencingData miningDatabases Nucleic AcidcomputerAlgorithmsData compression

researchProduct

Sorted deduplication: How to process thousands of backup streams

2016

The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…

File system020203 distributed computingComputer scienceData domainFingerprint (computing)Search engine indexingSorting020206 networking & telecommunications02 engineering and technologyParallel computingcomputer.software_genreBackupServerData_FILES0202 electrical engineering electronic engineering information engineeringData deduplicationcomputer2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)

researchProduct

A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection

2014

Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1 In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in …

Finite-state machineVDP::Technology: 500::Information and communication technology: 550::Computer technology: 551Computational complexity theoryData stream miningComputer scienceNearest neighbor searchSearch engine indexingInformationSystems_DATABASEMANAGEMENTIntrusion detection systemcomputer.software_genreCardinalityAnomaly detectionData miningcomputer

researchProduct

Hardware implementation of content based video indexing algorithms

2005

This paper focus on hardware implementation of content based video indexing techniques by using the FPGA technology. We aim to propose hardware modules that can satisfy requirements of constrained applications, such as real time applications and complex applications that can combine a large number of techniques in the same indexing system. We represent tow examples of micro-architectures related to the dominant colors descriptor and the compact color descriptor.

Focus (computing)Hardware modulesbusiness.industryComputer scienceContent (measure theory)Search engine indexingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONbusinessField-programmable gate arrayComputer hardwareMicroarchitectureContent based retrieval2005 12th IEEE International Conference on Electronics, Circuits and Systems

researchProduct

Quality of Service Management on Multimedia Data Transformation into Serial Stories Using Movement Oriented Method

2011

Multimedia data transformation into serial stories or story board will help to reduce the consumption of storage media, indexing, sorting and searching system. Movement Oriented Method that is being developed changes the form of multimedia data into serial stories. Movement Oriented Method depends on the knowledge each actor who uses it. Different knowledge of each actor in the transformation process raises complex issues, such as the sequence, and the resulted story object that could become the standard. And the most fatal could be, the resulted stories does not same with the original multimedia data. To solve it, the Standard Level Knowledge (SLK) in maintaining the quality of the story c…

General Computer ScienceMultimediaProcess (engineering)Computer scienceQuality of servicemedia_common.quotation_subjectSearch engine indexingData transformationObject (computer science)computer.software_genreTransformation (function)Quality (business)computermedia_commonInternational Journal of Advanced Computer Science and Applications

researchProduct

Comparing DNA sequence collections by direct comparison of compressed text indexes

2012

Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have h…

Genomics (q-bio.GN)SequenceComputer sciencebusiness.industrySearch engine indexingSequence alignmentPattern recognitionConstruct (python library)Data structureBurrows-Wheeler Transform; Splice junctions; External memoryExternal memoryFOS: Biological sciencesCode (cryptography)Quantitative Biology - GenomicsBurrows-Wheeler TransformArtificial intelligencebusinessSplice junctionsAuxiliary memoryReference genome

researchProduct