# Implicit Regularization in Deep Learning: A View from Function Space

@article{Baratin2020ImplicitRI, title={Implicit Regularization in Deep Learning: A View from Function Space}, author={Aristide Baratin and Thomas George and C{\'e}sar Laurent and R. Devon Hjelm and Guillaume Lajoie and Pascal Vincent and Simon Lacoste-Julien}, journal={ArXiv}, year={2020}, volume={abs/2008.00938} }

We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a possible regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. By extrapolating a new analysis of Rademacher complexity bounds in linear models, we propose and study a new heuristic complexity measure for neural networks which captures this phenomenon, in terms of sequences of… Expand

#### Figures and Topics from this paper

#### One Citation

Gradient Starvation: A Learning Proclivity in Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2020

This work provides a theoretical explanation for the emergence of feature imbalance in neural networks and develops guarantees for a novel regularization method aimed at decoupling feature learning dynamics, improving accuracy and robustness in cases hindered by gradient starvation. Expand

#### References

SHOWING 1-10 OF 63 REFERENCES

Geometry of Optimization and Implicit Regularization in Deep Learning

- Computer Science
- ArXiv
- 2017

This work argues that the optimization plays a crucial role in generalization of deep learning models through implicit regularization, and demonstrates how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. Expand

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2019

This work studies the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss using a time rescaling to show that this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank. Expand

Weighted Optimization: better generalization by smoother interpolation

- Computer Science, Mathematics
- ArXiv
- 2020

It is argued through this model and numerical experiments that normalization methods in deep learning such as weight normalization improve generalization in overparameterized neural networks by implicitly encouraging smooth interpolants. Expand

On the Inductive Bias of Neural Tangent Kernels

- Computer Science, Mathematics
- NeurIPS
- 2019

This work studies smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compares to other known kernels for similar architectures. Expand

Characterizing Implicit Bias in Terms of Optimization Geometry

- Mathematics, Computer Science
- ICML
- 2018

The question of whether the specific global minimum reached by an algorithm can be characterized in terms of the potential or norm of the optimization geometry, and independently of hyperparameter choices such as step-size and momentum is explored. Expand

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

- Computer Science, Mathematics
- NeurIPS
- 2018

This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features. Expand

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

- Computer Science, Mathematics
- ICLR
- 2015

It is argued, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning. Expand

Neural Spectrum Alignment: Empirical Study

- Computer Science
- ICANN
- 2020

This paper empirically explore properties of NTK along the optimization and shows that in practical applications the NTK changes in a very dramatic and meaningful way, with its top eigenfunctions aligning toward the target function learned by NN. Expand

A Note on Lazy Training in Supervised Differentiable Programming

- Computer Science, Mathematics
- ArXiv
- 2018

In a simplified setting, it is proved that "lazy training" essentially solves a kernel regression, and it is shown that this behavior is not so much due to over-parameterization than to a choice of scaling, often implicit, that allows to linearize the model around its initialization. Expand

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

- Computer Science, Mathematics
- ICLR
- 2020

This work proposes an approach to answering why neural networks trained with Gradient Descent generalize well on real datasets even though they are capable of fitting random data based on a hypothesis about the dynamics of gradient descent that is called Coherent Gradients. Expand