A Large Deviation Approach to Understanding Stochastic Gradient Descent Dynamics in Deep Learning

Location

Library Room 1576

Date and Time

Time

1:00 PM to 1:50 PM

Abstract

This project investigates learning dynamics of neural networks in loss-energy landscapes arising in machine learning / AI applications. We develop approaches building on large deviation theory to better understand training dynamics. We aim to describe stochastic gradient descent as a noise-driven dynamical system and to analyze how rare transitions between metastable regions of the landscape govern long-time training behavior. We emphasize the roles of the quasipotential, Freidlin-Wentzell theory, and Kramers-type escape rate laws in characterizing escape events and effective barriers. To connect this theory with realistic models, we study low-dimensional slices and parametrizations of neural network loss landscapes, allowing analytical and computational investigation of transition pathways and basin structure. More broadly, the project explores how stochasticity in SGD can reduce dependence on initialization and impact convergence in modern deep learning systems.

A Large Deviation Approach to Understanding Stochastic Gradient Descent Dynamics in Deep Learning

Location

Date and Time

Abstract

Links and Resources