0000000000256481

AUTHOR

Giulia Bernardini

0000-0001-6647-088x

Reverse-safe data structures for text indexing

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…

research product

Second asymptomatic carotid surgery trial (ACST-2): a randomised comparison of carotid artery stenting versus carotid endarterectomy.

Summary Background Among asymptomatic patients with severe carotid artery stenosis but no recent stroke or transient cerebral ischaemia, either carotid artery stenting (CAS) or carotid endarterectomy (CEA) can restore patency and reduce long-term stroke risks. However, from recent national registry data, each option causes about 1% procedural risk of disabling stroke or death. Comparison of their long-term protective effects requires large-scale randomised evidence. Methods ACST-2 is an international multicentre randomised trial of CAS versus CEA among asymptomatic patients with severe stenosis thought to require intervention, interpreted with all other relevant trials. Patients were eligib…

research product

Reverse-Safe Text Indexing

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matc…

research product

Substring Complexity in Sublinear Space

Shannon's entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad-hoc measures are employed to estimate the repetitiveness of strings, e.g., the size $z$ of the Lempel-Ziv parse or the number $r$ of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size $\gamma$ of a smallest string attractor. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing $\gamma$ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure that is based on the function $S_T$ counting the cardinalities of the sets of substrings of each length of $T$, also known as …

research product