Invited Talk: Practical Theory and Neural Network Models

Speaker: Michael Mahoney, UC Berkeley
Talk title: Practical Theory and Neural Network Models

Time: Tuesday, April 20, 11:20am-12:20pm (PT)

Abstract:
Working with state-of-the-art (SOTA) neural network (NN) models is a practical business, and it demands a practical theory. It also provides us with an opportunity to ask questions at the foundations of data, including what is the role of theory and how theory can be formulated for models that depend so strongly on the data. We’ll review empirical results on nearly every publicly-available pre-trained SOTA model, and we’ll identify ubiquitous heavy-tailed structure in the correlations of weight matrices. Based on this, we’ll describe the theory of Heavy-Tailed Self-Regularization (HT-SR), which makes strong use of statistical mechanics and heavy-tailed Random Matrix Theory. HT-SR is a theory that can be used on SOTA NN models. For example, we can use it to identify multiple qualitatively-different phases of learning. We can use it to predict trends in the quality of SOTA NNs without access to training or testing data. Finally, by examining the complementary role of norm-based size metrics versus heavy-tailed shape metrics, we can use it to identify Simpson’s paradoxes in public contests aimed at finding metrics that were causally informative of generalization. We’ll conclude by describing several parallel theoretical results on more idealized models: using surrogate random design to provide exact expressions for double descent and implicit regularization; identifying a precise phase transition and the corresponding double descent for a random Fourier features model in the thermodynamic limit; how multiplicative noise leads to heavy tails in stochastic optimization; and a novel methodology to compute precisely the full distribution of test errors among interpolating classifiers which shows that good classifiers are abundant in the interpolating regime. These theoretical results are formulated in a way more familiar to this community; and it is a major open problem to bridge the gap between these and other statistical learning theory results and HT-SR theory and the empirical results on SOTA models.

Return to workshop schedule