We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n |Σ| log(n) log(|Σ|)) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ⊆Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ| log(n)).
Efficient Text Fingerprinting Via Parikh Mapping
APOSTOLICO, ALBERTO;SATTA, GIORGIO
2003
Abstract
We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n |Σ| log(n) log(|Σ|)) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ⊆Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ| log(n)).Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.