Multi-Resolution Hashing for Fast Pairwise Summations

07/19/2018
by   Moses Charikar, et al.
0

A basic computational primitive in the analysis of massive datasets is summing simple functions over a large number of objects. Modern applications pose an additional challenge in that such functions often depend on a parameter vector y (query) that is unknown a priori. Given a set of points X⊂R^d and a pairwise function w:R^d×R^d→ [0,1], we study the problem of designing a data-structure that enables sublinear-time approximation of the summation Z_w(y)=1/|X|∑_x∈ Xw(x,y) for any query y∈R^d. By combining ideas from Harmonic Analysis (partitions of unity and approximation theory) with Hashing-Based-Estimators [Charikar, Siminelakis FOCS'17], we provide a general framework for designing such data-structures through hashing. A key design principle is a collection of T≥ 1 hashing schemes with collision probabilities p_1,..., p_T such that _t∈ [T]{p_t(x,y)} = Θ(√(w(x,y))). This leads to a data-structure that approximates Z_w(y) using a sub-linear number of samples from each hash family. Using this new framework along with Distance Sensitive Hashing [Aumuller, Christiani, Pagh, Silvestri PODS'18], we show that such a collection can be constructed and evaluated efficiently for any log-convex function w(x,y)=e^ϕ(〈 x,y〉) of the inner product on the unit sphere x,y∈S^d-1. Our method leads to data-structures with sub-linear query time that significantly improve upon random sampling and can be used for Kernel Density or Partition Function Estimation. We provide extensions of our result from the sphere to R^d, and from scalar functions to vector functions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro