Data blurring: sample splitting a single sample

12/21/2021
by   James Leiner, et al.
9

Suppose we observe a random vector X from some distribution P in a known family with unknown parameters. We ask the following question: when is it possible to split X into two parts f(X) and g(X) such that neither part is sufficient to reconstruct X by itself, but both together can recover X fully, and the joint distribution of (f(X),g(X)) is tractable? As one example, if X=(X_1,…,X_n) and P is a product distribution, then for any m<n, we can split the sample to define f(X)=(X_1,…,X_m) and g(X)=(X_m+1,…,X_n). Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of X with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro