Search results for " Indexing"

showing 10 items of 88 documents

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

Linear-size suffix tries

2016

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = | w | , a suffix tree for w takes O ( n ) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ ( n 2 ) nodes and lin…

Compressed suffix arrayGeneral Computer ScienceSuffix tree[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix tree0102 computer and information sciences02 engineering and technologyData_CODINGANDINFORMATIONTHEORYText indexing01 natural sciencesY-fast trielaw.inventionLongest common substring problemTheoretical Computer ScienceCombinatoricsSuffix treelawFactor and suffix automata0202 electrical engineering electronic engineering information engineeringData_FILESArithmeticFactor and suffix automata; Pattern matching; Suffix tree; Text indexing; Theoretical Computer Science; Computer Science (all)Pattern matchingMathematicsSettore INF/01 - InformaticaX-fast trieComputer Science (all)LCP array010201 computation theory & mathematics020201 artificial intelligence & image processingFM-index
researchProduct

Creating a semantically-enhanced cloud services environment through ontology evolution

2014

Currently, the availability of Web resources has grown enormously to the point that whatever a user needs at a given moment can potentially be found on the Internet. These resources are not limited to data items anymore, functionality delivered through some sort of service architectural model is also offered on the Internet. In the last few years, cloud computing has emerged as one of the most popular computing models to provide services over the Internet. However, as the number of available cloud services increases, the problem of service discovery and selection arises. Experience indicates that semantic technologies can provide the basis for enhanced and more precise search processes. In …

Computer Networks and CommunicationsComputer sciencecomputer.internet_protocolService discoveryCloud computingcomputer.software_genreSocial Semantic WebOWL-SWorld Wide WebSemantic computingSemantic analyticsUpper ontologySemantic Web StackSemantic WebInformation retrievalbusiness.industrySearch engine indexingSemantic searchInformation extractionSemantic gridHardware and ArchitectureOntologySemantic technologyThe InternetWeb resourcebusinesscomputerSoftwareFuture Generation Computer Systems
researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
researchProduct

The indexing of persons in news sequences using audio-visual data

2004

We describe a video indexing system that automatically searches for a specific person in a news sequence. The proposed approach combines audio and video confidence values extracted from speaker and face recognition analysis. The system also incorporates a shot selection module that seeks for anchors, where the person on the scene is likely speaking. The system has been extensively tested on several news sequences with very good recognition rates.

Contextual image classificationComputer scienceSpeech recognitionSearch engine indexingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONSelection (linguistics)Speaker recognitionAudio signal processingcomputer.software_genrecomputerFacial recognition systemElectronic mail2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
researchProduct

Reverse-Safe Text Indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…

Data structuresComputer scienceSuffix treesuffix tree0102 computer and information sciences02 engineering and technologytext indexing01 natural sciencesTheoretical Computer Sciencelaw.inventionSet (abstract data type)law020204 information systems0202 electrical engineering electronic engineering information engineeringPattern matchingdata privacySettore INF/01 - InformaticaSearch engine indexingdata privacy; Data structures; pattern matching; suffix tree; text indexingData structureMatrix multiplicationpattern matching010201 computation theory & mathematicsData structureAlgorithmAdversary modelInteger (computer science)ACM Journal of Experimental Algorithmics
researchProduct

Suitability of a content-based retrieval method in astronomical image databases

1996

Abstract Indexing and retrieval methods based on the image content are required to effectively use information from large repositories of digital images. Usually, the way to search for data and images in astronomical archives is via textual queries expressed in terms of constraints on observation parameters. In this paper we present a method for automatic extraction of images by using shape descriptions based on local symmetry. The proposed indexing methodology has been developed and tested inside JACOB, a prototypal system for content-based video database querying.

Digital imageInformation retrievalAutomatic image annotationComputer scienceContent (measure theory)Search engine indexingAstronomy and AstrophysicsVisual WordImage retrievalImage (mathematics)Content based retrievalVistas in Astronomy
researchProduct

A robust blind 3-D mesh watermarking based on wavelet transform for copyright protection

2019

Nowadays, three-dimensional meshes have been extensively used in several applications such as, industrial, medical, computer-aided design (CAD) and entertainment due to the processing capability improvement of computers and the development of the network infrastructure. Unfortunately, like digital images and videos, 3-D meshes can be easily modified, duplicated and redistributed by unauthorized users. Digital watermarking came up while trying to solve this problem. In this paper, we propose a blind robust watermarking scheme for three-dimensional semiregular meshes for Copyright protection. The watermark is embedded by modifying the norm of the wavelet coefficient vectors associated with th…

FOS: Computer and information sciences0209 industrial biotechnologyComputer sciencevideo watermarking02 engineering and technologyWatermarkingimage watermarking020901 industrial engineering & automationWaveletcopy protectionvectorsRobustness (computer science)Computer Science::Multimedia0202 electrical engineering electronic engineering information engineeringwavelet coefficient vectorsControlled IndexingComputer visionPolygon meshQuantization (image processing)RobustnessDigital watermarkingComputingMilieux_MISCELLANEOUSComputer Science::Cryptography and SecurityQuantization (signal)digital watermarkingbusiness.industrycopyrightedge normal normsWavelet transformunauthorized usersWatermarkThree-dimensional meshesMultimedia (cs.MM)mesh generationwavelet transformssynchronizing primitives3D semiregular meshesSolid modelingrobust blind 3D mesh watermarking020201 artificial intelligence & image processingArtificial intelligenceLaplacian smoothingbusinessCopyright protection[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processingComputer Science - Multimediaimage resolutionDigital images
researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct