Contributed Talk: Computing the Typical Information Content of Infinitely Wide Neural Networks – Workshop on the Theory of Overparameterized Machine Learning

Speaker: Jeremy Bernstein, Caltech
Talk title: Computing the Typical Information Content of Infinitely Wide Neural Networks

Time: Wednesday, April 21, 9:00am-9:25am (PT)

Abstract:
If a classifier solves a data set via many different settings of its weights, then a typical solution can be described more efficiently than by its raw weight vector. This might provide the mechanism by which neural networks with more parameters than data reliably generalise. This paper derives a consistent estimator and a closed-form upper bound on the typical information content of an infinitely wide neural network that fits a training set with binary labels. The estimator and upper bound depend only on the training inputs, training labels and network architecture. The derivation relies on identifying the typical information content in the infinite width limit with the log reciprocal of a Gaussian orthant probability. The results are used to upper bound the average test error of all infinitely wide networks that attain zero training error (via the PAC-Bayesian method). The generalisation bound is verified for binary classifiers trained on MNIST.

Joint work with Yisong Yue.

Return to workshop schedule