Abstract
Understanding neural network's (NN) generalizability remains a central question in deep learning research. The special phenomenon of grokking, where NNs abruptly generalize long after the training performance reaches near-perfect level, offers a unique window to investigate the underlying mechanisms of NNs' generalizability. Here we propose an interpretation for grokking by framing it as a computational glass relaxation: viewing NNs as a physical system where parameters are the degrees of freedom and train loss is the system energy, we find memorization process resembles a rapid cooling of liquid into non-equilibrium glassy state at low temperature and the later generalization is like a slow relaxation towards a more stable configuration. This mapping enables us to sample NNs' Boltzmann entropy (states of density) landscape as a function of training loss and test accuracy. Our experiments in transformers on arithmetic tasks suggests that there is NO entropy barrier in the memorization-to-generalization transition of grokking, challenging previous theory that defines grokking as a first-order phase transition. We identify a high-entropy advantage under grokking, an extension of prior work linking entropy to generalizability but much more significant. Inspired by grokking's far-from-equilibrium nature, we develop a toy optimizer WanD based on Wang-landau molecular dynamics, which can eliminate grokking without any constraints and find high-norm generalizing solutions. This provides strictly-defined counterexamples to theory attributing grokking solely to weight norm evolution towards the Goldilocks zone and also suggests new potential ways for optimizer design.
| Original language | English |
|---|---|
| Title of host publication | Advances in Neural Information Processing Systems 38 (NeurIPS 2025) |
| Editors | D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, N. Chen |
| Number of pages | 25 |
| Publication status | Published - 2025 |
| Event | 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) - San Diego, United States Duration: 2 Dec 2025 → 7 Dec 2025 https://neurips.cc/Conferences/2025 |
Conference
| Conference | 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025) |
|---|---|
| Abbreviated title | NeurIPS 2025 |
| Place | United States |
| City | San Diego |
| Period | 2/12/25 → 7/12/25 |
| Internet address |
Fingerprint
Dive into the research topics of 'Is Grokking a Computational Glass Relaxation?'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver