Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations

07/03/2020
by   David L. Donoho, et al.
0

Given two samples from possibly different discrete distributions over a common set of size N, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of N^1-β elements are perturbed by r(log N)/2n in the Hellinger distance, where n is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from N exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter β and the perturbation intensity parameter r. Specifically, we derive a region in the (β,r)-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense (N≫ n) and sparse (N≪ n) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro