On prefix normal words and prefix normal forms

6533b837fe1ef96bd12a2974

RESEARCH PRODUCT

On prefix normal words and prefix normal forms

Joe Sawada Péter Burcsi Zsuzsanna Lipták Gabriele Fici Frank Ruskey

subject

FOS: Computer and information sciences Prefix code Prefix normal word Pre-necklace Discrete Mathematics (cs.DM)General Computer Science Formal Languages and Automata Theory (cs.FL)Binary number Computer Science - Formal Languages and Automata Theory Context (language use)Binary language Lyndon words 0102 computer and information sciences 02 engineering and technology Prefix grammar prefix normal forms Kraft's inequality Characterization (mathematics)Lyndon word 01 natural sciences Prefix normal form enumeration Theoretical Computer Science FOS: Mathematics 0202 electrical engineering electronic engineering information engineering Mathematics - Combinatorics Mathematics Discrete mathematics prefix normal words prefix normal forms binary languages binary jumbled pattern matching pre-necklaces Lyndon words enumeration binary jumbled pattern matching Settore INF/01 - Informatica Computer Science (all)pre-necklaces Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)prefix normal words Prefix 010201 computation theory & mathematics 020201 artificial intelligence & image processing Combinatorics (math.CO)binary languages Computer Science::Formal Languages and Automata Theory Word (group theory)Computer Science - Discrete Mathematics

description

A $1$-prefix normal word is a binary word with the property that no factor has more $1$s than the prefix of the same length; a $0$-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of $1$s and $0$s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors. We prove that the language of prefix normal words is not context-free and is strictly contained in the language of pre-necklaces, which are prefixes of powers of Lyndon words. We give enumeration results on $\textit{pnw}(n)$, the number of prefix normal words of length $n$, showing that, for sufficiently large $n$, \[ 2^{n-4 \sqrt{n \lg n}} \le \textit{pnw}(n) \le 2^{n - \lg n + 1}. \] For fixed density (number of $1$s), we show that the ordinary generating function of the number of prefix normal words of length $n$ and density $d$ is a rational function. Finally, we give experimental results on $\textit{pnw}(n)$, discuss further properties, and state open problems.

year	journal	country	edition	language
2016-11-28	Theoretical Computer Science

https://doi.org/10.1016/j.tcs.2016.10.015