Online algorithms for finding distinct substrings with length and multiple prefix and suffix conditions

07/09/2022
by   Laurentius Leonard, et al.
0

Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k_1 and k_2 be given, as well as a string T given in an online manner, such that T_i represents the length-i prefix of T for 1 ≤ i ≤ |T|. In this paper we are interested in computing the set 𝑎𝑛𝑠_𝑖 of distinct substrings w of T_i such that k_1 ≤ |w| ≤ k_2, and w contains some p ∈ P as a prefix and some s ∈ S as a suffix. More specifically, the counting problem is to output |𝑎𝑛𝑠_𝑖|, whereas the reporting problem is to output all elements of 𝑎𝑛𝑠_𝑖, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖=∑_u∈ A|u|. Then, we show that after O((‖ P‖ +‖ S‖)logσ)-time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(|T_i| logσ) and O(|T_i| logσ + |𝑎𝑛𝑠_𝑖|) total time. The preprocessing time can be reduced to O(‖ P‖ +‖ S‖) for integer alphabets of size polynomial with regard to ‖ P‖ +‖ S‖. Our algorithms have possible applications to network traffic classification.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro