6533b7d5fe1ef96bd12646d9
RESEARCH PRODUCT
Reverse-safe data structures for text indexing
Solon P. PissisHuiping ChenGrigorios LoukidesGiulia BernardiniGiulia BernardiniGabriele Ficisubject
050101 languages & linguisticsComputer sciencedata structure02 engineering and technologyprivacySet (abstract data type)combinatoric0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPattern matchingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazionialgorithmSettore INF/01 - Informatica05 social sciencesSearch engine indexingINF/01 - INFORMATICAdata miningData structureMatrix multiplicationcombinatoricsExponent020201 artificial intelligence & image processingdata structure; algorithm; combinatorics; de Bruijn graph; data mining; privacyAlgorithmAdversary modelde Bruijn graphInteger (computer science)description
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n^ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n^ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model.
year | journal | country | edition | language |
---|---|---|---|---|
2021-12-01 |