Author: Alessandra Gabriele

0000000000012786

AUTHOR

Alessandra Gabriele

showing 13 related works from this author

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science

researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics

researchProduct

On lazy representations and Sturmian graphs

2011

In this paper we establish a strong relationship between the set of lazy representations and the set of paths in a Sturmian graph associated with a real number α. We prove that for any non-negative integer i the unique path weighted i in the Sturmian graph associated with α represents the lazy representation of i in the Ostrowski numeration system associated with α. Moreover, we provide several properties of the representations of the natural integers in this numeration system.

Discrete mathematicsCombinatoricsOstrowski numerationIntegernumeration systems Sturmian graphs continued fractionsSettore INF/01 - InformaticaGraphMathematicsReal number

researchProduct

From Nerode's congruence to Suffix Automata with mismatches

2009

AbstractIn this paper we focus on the minimal deterministic finite automaton Sk that recognizes the set of suffixes of a word w up to k errors. As first result we give a characterization of the Nerode’s right-invariant congruence that is associated with Sk. This result generalizes the classical characterization described in [A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. Chen, J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoretical Computer Science, 40, 1985, 31–55]. As second result we present an algorithm that makes use of Sk to accept in an efficient way the language of all suffixes of w up to k errors in every window of size r of a text, where r is the…

General Computer ScienceOpen problem[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesTheoretical Computer ScienceCombinatoricsDeterministic automatonSuffix automata0202 electrical engineering electronic engineering information engineeringCombinatorics on words Indexing Suffix Automata Languages with mismatches Approximate string matchingMathematicsDiscrete mathematicsCombinatorics on wordsApproximate string matchingSettore INF/01 - InformaticaLanguages with mismatchesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)PrefixCombinatorics on wordsDeterministic finite automaton010201 computation theory & mathematicsSuffix automatonIndexing020201 artificial intelligence & image processingSuffixComputer Science::Formal Languages and Automata TheoryComputer Science(all)

researchProduct

Sturmian graphs and integer representations over numeration systems

2012

AbstractIn this paper we consider a numeration system, originally due to Ostrowski, based on the continued fraction expansion of a real number α. We prove that this system has deep connections with the Sturmian graph associated with α. We provide several properties of the representations of the natural integers in this system. In particular, we prove that the set of lazy representations of the natural integers in this numeration system is regular if and only if the continued fraction expansion of α is eventually periodic. The main result of the paper is that for any number i the unique path weighted i in the Sturmian graph associated with α represents the lazy representation of i in the Ost…

Discrete mathematicsContinued fractionsApplied MathematicsNumeration systemsSturmian graphsGraphCombinatoricsOstrowski numerationIntegerIf and only ifnumeration systems Sturmian graphs continued fractions.Numeration systems; SUBWORD GRAPHS; WORDSDiscrete Mathematics and CombinatoricsSUBWORD GRAPHSContinued fractionWORDSMathematicsReal number

researchProduct

On the suffix automaton with mismatches

2007

International audience; In this paper we focus on the construction of the minimal deterministic finite automaton S_k that recognizes the set of suffixes of a word w up to k errors. We present an algorithm that makes use of S_k in order to accept in an efficient way the language of all suffixes of w up to k errors in every window of size r, where r is the value of the repetition index of w. Moreover, we give some experimental results on some well-known words, like prefixes of Fibonacci and Thue-Morse words, and we make a conjecture on the size of the suffix automaton with mismatches.

approximate string matchingFibonacci numberlanguages with mismatches[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeBüchi automatonComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)0102 computer and information sciences02 engineering and technology01 natural sciencesCombinatoricsPrefixCombinatorics on wordsDeterministic finite automaton010201 computation theory & mathematics0202 electrical engineering electronic engineering information engineeringSuffix automaton020201 artificial intelligence & image processingsuffix automatacombinatorics on wordsComputer Science::Data Structures and Algorithmscombinatorics on words suffix automata languages with mismatches approximate string matchingWord (computer architecture)Computer Science::Formal Languages and Automata TheoryMathematics

researchProduct

Novel Combinatorial and Information-Theoretic Alignment-Free Distances for Biological Data Mining

2010

Among the plethora of alignment-free methods for comparing biological sequences, there are some that we have perceived as representative of the novel techniques that have been devised in the past few years and as being of a fundamental nature and of broad interest and applicability, ranging from combinatorics to information theory. In this chapter, we review these alignment free methods, by presenting both their mathematical definitions and the experiments in which they are involved in.

Settore INF/01 - InformaticaComputer scienceAlignment-free distances for biological sequenceBiological data miningData miningcomputer.software_genrecomputer

researchProduct

Languages with mismatches and an application to approximate indexing

2005

In this paper we describe a factorial language, denoted by L(S, k,r), that contains all words that occur in a string 5 up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h ? 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S, k, r) is a non-increasing function of r and a non-decreasing function of k and that the equation r = R(S, k, r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that rep…

FactorialCombinatorics on wordsString (computer science)Function (mathematics)formal languagesmatching indexingCombinatoricsCombinatorics on wordsIntegerapproximate stringPosition (vector)TrieAlgorithmWord (group theory)Mathematics

researchProduct

On the longest common factor problem

2008

The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take…

Discrete mathematicsSettore INF/01 - InformaticaSuffix tree[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeDAWGsuffix tree[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data structureLongest common substring problemlaw.inventionCombinatoricsSet (abstract data type)Range (mathematics)lawLongest Common Factor ProblemSuffixLowest common ancestorMathematics

researchProduct

Genome-wide characterization of chromatin binding and nucleosome spacing activity of the nucleosome remodelling ATPase ISWI

2011

The evolutionarily conserved ATP-dependent nucleosome remodelling factor ISWI can space nucleosomes affecting a variety of nuclear processes. In Drosophila, loss of ISWI leads to global transcriptional defects and to dramatic alterations in higher-order chromatin structure, especially on the male X chromosome. In order to understand if chromatin condensation and gene expression defects, observed in ISWI mutants, are directly correlated with ISWI nucleosome spacing activity, we conducted a genome-wide survey of ISWI binding and nucleosome positioning in wild-type and ISWI mutant chromatin. Our analysis revealed that ISWI binds both genic and intergenic regions. Remarkably, we found that ISWI…

GeneticsRegulation of gene expressionGeneral Immunology and MicrobiologyGeneral NeuroscienceChromatin bindingBiologyDNA-binding proteinGeneral Biochemistry Genetics and Molecular BiologyChromatinProphaseNucleosomeMolecular BiologyTranscription factorChromatin immunoprecipitationThe EMBO Journal

researchProduct

Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

2010

In the quest for a mathematical measure able to capture and shed light on the dual notions of information and complexity in biosequences, Hazen et al. have introduced the notion of Functional Information (FI for short). It is also the result of earlier considerations and findings by Szostak and Carothers et al. Based on the experiments by Charoters et al., regarding FI in RNA binding activities, we decided to study the relation existing between FI and classic measures of complexity applied on protein-DNA interactions on a genome-wide scale. Using classic complexity measures, i.e, Shannon entropy and Kolmogorov Complexity as both estimated by data compression, we found that FI applied to pro…

sequence complexityFunctional Activity Sequence Complexity Combinatorics onWords Protein-DNA interaction.combinatorics on wordsFunctional activityprotein-DNA interaction.

researchProduct

On numeration systems and Sturmian graphs

2005

researchProduct

Approximate string matching: indexing and the k-mismatch problem

2004

researchProduct