BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

02/19/2018
by   Sheng-Wei Chen, et al.
0

We propose a block-diagonal approximation of the positive-curvature Hessian (BDA-PCH) matrix to measure curvature. Our proposed BDAPCH matrix is memory efficient and can be applied to any fully-connected neural networks where the activation and criterion functions are twice differentiable. Particularly, our BDA-PCH matrix can handle non-convex criterion functions. We devise an efficient scheme utilizing the conjugate gradient method to derive Newton directions for mini-batch setting. Empirical studies show that our method outperforms the competing second-order methods in convergence speed.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro