Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

10/15/2017
by   Tomoya Sakai, et al.
0

Capturing input-output dependency is an important task in statistical data analysis. Mutual information (MI) is a vital tool for this purpose, but it is known to be sensitive to outliers. To cope with this problem, a squared-loss variant of MI (SMI) was proposed, and its supervised estimator has been developed. On the other hand, in real-world classification problems, it is conceivable that only positive and unlabeled (PU) data are available. In this paper, we propose a novel estimator of SMI only from PU data, and prove its optimal convergence to true SMI. Based on the PU-SMI estimator, we further propose a dimension reduction method which can be executed without estimating the class-prior probabilities of unlabeled data. Such PU class-prior estimation is often required in PU classification algorithms, but it is unreliable particularly in high-dimensional problems, yielding a biased classifier. Our dimension reduction method significantly boosts the accuracy of PU class-prior estimation, as demonstrated through experiments. We also develop a method of independent testing based on our PU-SMI estimator and experimentally show its superiority.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro