Fisher information and natural gradient learning in random deep networks
The 22nd International Conference on Artificial Intelligence …, 2019•proceedings.mlr.press
The parameter space of a deep neural network is a Riemannian manifold, where the metric
is defined by the Fisher information matrix. The natural gradient method uses the steepest
descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix,
however, which is practically difficult. The present paper uses statistical neurodynamical
method to reveal the properties of the Fisher information matrix in a net of random
connections. We prove that the Fisher information matrix is unit-wise block diagonal …
is defined by the Fisher information matrix. The natural gradient method uses the steepest
descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix,
however, which is practically difficult. The present paper uses statistical neurodynamical
method to reveal the properties of the Fisher information matrix in a net of random
connections. We prove that the Fisher information matrix is unit-wise block diagonal …
Abstract
The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.
proceedings.mlr.press
Showing the best result for this search. See all results