February 24
@
11:30 AM
–
12:30 PM
Foundations of language models: scaling and reasoning
Eshaan Nichani
Abstract:
Modern deep learning methods, most prominently language models, have achieved tremendous empirical success, yet a theoretical understanding of how neural networks learn from data remains incomplete. While reasoning directly about these approaches is often intractable, formalizing core empirical phenomena through minimal “sandbox” tasks offers a promising path toward principled theory. In this talk, Nichani will demonstrate how proving end-to-end learning guarantees for such tasks yields a practical understanding of how the network architecture, optimization algorithm, and data distribution jointly give rise to key behaviors. First, they will show how neural scaling laws arise from the dynamics of stochastic gradient descent in shallow neural networks. Next, they will study how and under what conditions transformers trained via gradient descent can learn different reasoning behaviors, including in-context learning and multi-step reasoning. Altogether, this approach builds theories that provide concrete insight into the behavior of modern AI systems.
Bio:
Eshaan Nichani is a final-year Ph.D. student in the Electrical and Computer Engineering (ECE) department at Princeton University, jointly advised by Jason D. Lee and Yuxin Chen. His research focuses on the theory of deep learning, ranging from characterizing the fundamental limits of shallow neural networks to understanding how LLM phenomena emerge during training. He is a recipient of the IBM PhD Fellowship and the NDSEG Fellowship, and was selected as a 2025 Rising Star in Data Science.
Location details: Discovery Building – Research’s Link, 2nd floor of Discovery Building (access through glass doors behind information desk)