Counting Distinct Patterns in Internal Dictionary Matching

05/12/2020
βˆ™
by   Panagiotis Charalampopoulos, et al.
βˆ™
0
βˆ™

We consider the problem of preprocessing a text T of length n and a dictionary π’Ÿ in order to be able to efficiently answer queries CountDistinct(i,j), that is, given i and j return the number of patterns from π’Ÿ that occur in the fragment T[i . . j]. The dictionary is internal in the sense that each pattern in π’Ÿ is given as a fragment of T. This way, the dictionary takes space proportional to the number of patterns d=|π’Ÿ| rather than their total length, which could be Θ(nΒ· d). An π’ͺΜƒ(n+d)-size data structure that answers CountDistinct(i,j) queries π’ͺ(log n)-approximately in π’ͺΜƒ(1) time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an π’ͺΜƒ(n+d)-size data structure that answers CountDistinct(i,j) queries 2-approximately in π’ͺΜƒ(1) time. Using range queries, for any m, we give an π’ͺΜƒ(min(nd/m,n^2/m^2)+d)-size data structure that answers CountDistinct(i,j) queries exactly in π’ͺΜƒ(m) time. We also consider the special case when the dictionary consists of all square factors of the string. We design an π’ͺ(n log^2 n)-size data structure that allows us to count distinct squares in a text fragment T[i . . j] in π’ͺ(log n) time.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro