Truly Perfect Samplers for Data Streams and Sliding Windows

08/26/2021
by   Rajesh Jayaram, et al.
0

In the G-sampling problem, the goal is to output an index i of a vector f ∈ℝ^n, such that for all coordinates j ∈ [n], Pr[i=j] = (1 ±ϵ) G(f_j)/∑_k∈[n] G(f_k) + γ, where G:ℝ→ℝ_≥ 0 is some non-negative function. If ϵ = 0 and γ = 1/poly(n), the sampler is called perfect. In the data stream model, f is defined implicitly by a sequence of updates to its coordinates, and the goal is to design such a sampler in small space. Jayaram and Woodruff (FOCS 2018) gave the first perfect L_p samplers in turnstile streams, where G(x)=|x|^p, using polylog(n) space for p∈(0,2]. However, to date all known sampling algorithms are not truly perfect, since their output distribution is only point-wise γ = 1/poly(n) close to the true distribution. This small error can be significant when samplers are run many times on successive portions of a stream, and leak potentially sensitive information about the data stream. In this work, we initiate the study of truly perfect samplers, with ϵ = γ = 0, and comprehensively investigate their complexity in the data stream and sliding window models. Abstract truncated due to arXiv limits; please see paper for full abstract.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro