Compressing invariant manifolds in neural nets

07/22/2020
by   Jonas Paccolat, et al.
0

We study how neural networks compress uninformative input space in models where data lie in d dimensions, but whose label only vary within a linear manifold of dimension d_∥ < d. We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the uninformative d_⊥=d-d_∥ space is compressed by a factor λ∼√(p), where p is the size of the training set. We quantify the benefit of such a compression on the test error ϵ. For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that ϵ∼ p^-β, with β_Lazy = d / (3d-2). Compression improves the learning curves so that β_Feature = (2d-1)/(3d-2) if d_∥ = 1 and β_Feature = (d + d_⊥/2)/(3d-2) if d_∥ > 1. We test these predictions for a stripe model where boundaries are parallel interfaces (d_∥=1) as well as for a cylindrical boundary (d_∥=2). Next we show that compression shapes the Neural Tangent Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden layer FC network trained on the stripe model and for a 16-layers CNN trained on MNIST. The great similarities found in these two cases support that compression is central to the training of MNIST, and puts forward kernel-PCA on the evolving NTK as a useful diagnostic of compression in deep nets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro