256 subscribers
Gå offline med appen Player FM !
Is ChatGPT an N-gram model on steroids?
Manage episode 434317798 series 2803422
DeepMind Research Scientist / MIT scholar Dr. Timothy Nguyen discusses his recent paper on understanding transformers through n-gram statistics. Nguyen explains his approach to analyzing transformer behavior using a kind of "template matching" (N-grams), providing insights into how these models process and predict language.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key points covered include:
A method for describing transformer predictions using n-gram statistics without relying on internal mechanisms.
The discovery of a technique to detect overfitting in large language models without using holdout sets.
Observations on curriculum learning, showing how transformers progress from simpler to more complex rules during training.
Discussion of distance measures used in the analysis, particularly the variational distance.
Exploration of model sizes, training dynamics, and their impact on the results.
We also touch on philosophical aspects of describing versus explaining AI behavior, and the challenges in understanding the abstractions formed by neural networks. Nguyen concludes by discussing potential future research directions, including attempts to convert descriptions of transformer behavior into explanations of internal mechanisms.
Timothy Nguyen's earned his B.S. and Ph.D. in mathematics from Caltech and MIT, respectively. He held positions as Research Assistant Professor at the Simons Center for Geometry and Physics (2011-2014) and Visiting Assistant Professor at Michigan State University (2014-2017). During this time, his research expanded into high-energy physics, focusing on mathematical problems in quantum field theory. His work notably provided a simplified and corrected formulation of perturbative path integrals.
Since 2017, Nguyen has been working in industry, applying his expertise to machine learning. He is currently at DeepMind, where he contributes to both fundamental research and practical applications of deep learning to solve real-world problems.
Refs:
The Cartesian Cafe
https://www.youtube.com/@TimothyNguyen
Understanding Transformers via N-Gram Statistics
https://www.researchgate.net/publication/382204056_Understanding_Transformers_via_N-Gram_Statistics
TOC
00:00:00 Timothy Nguyen's background
00:02:50 Paper overview: transformers and n-gram statistics
00:04:55 Template matching and hash table approach
00:08:55 Comparing templates to transformer predictions
00:12:01 Describing vs explaining transformer behavior
00:15:36 Detecting overfitting without holdout sets
00:22:47 Curriculum learning in training
00:26:32 Distance measures in analysis
00:28:58 Model sizes and training dynamics
00:30:39 Future research directions
00:32:06 Conclusion and future topics
216 episoder
Manage episode 434317798 series 2803422
DeepMind Research Scientist / MIT scholar Dr. Timothy Nguyen discusses his recent paper on understanding transformers through n-gram statistics. Nguyen explains his approach to analyzing transformer behavior using a kind of "template matching" (N-grams), providing insights into how these models process and predict language.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key points covered include:
A method for describing transformer predictions using n-gram statistics without relying on internal mechanisms.
The discovery of a technique to detect overfitting in large language models without using holdout sets.
Observations on curriculum learning, showing how transformers progress from simpler to more complex rules during training.
Discussion of distance measures used in the analysis, particularly the variational distance.
Exploration of model sizes, training dynamics, and their impact on the results.
We also touch on philosophical aspects of describing versus explaining AI behavior, and the challenges in understanding the abstractions formed by neural networks. Nguyen concludes by discussing potential future research directions, including attempts to convert descriptions of transformer behavior into explanations of internal mechanisms.
Timothy Nguyen's earned his B.S. and Ph.D. in mathematics from Caltech and MIT, respectively. He held positions as Research Assistant Professor at the Simons Center for Geometry and Physics (2011-2014) and Visiting Assistant Professor at Michigan State University (2014-2017). During this time, his research expanded into high-energy physics, focusing on mathematical problems in quantum field theory. His work notably provided a simplified and corrected formulation of perturbative path integrals.
Since 2017, Nguyen has been working in industry, applying his expertise to machine learning. He is currently at DeepMind, where he contributes to both fundamental research and practical applications of deep learning to solve real-world problems.
Refs:
The Cartesian Cafe
https://www.youtube.com/@TimothyNguyen
Understanding Transformers via N-Gram Statistics
https://www.researchgate.net/publication/382204056_Understanding_Transformers_via_N-Gram_Statistics
TOC
00:00:00 Timothy Nguyen's background
00:02:50 Paper overview: transformers and n-gram statistics
00:04:55 Template matching and hash table approach
00:08:55 Comparing templates to transformer predictions
00:12:01 Describing vs explaining transformer behavior
00:15:36 Detecting overfitting without holdout sets
00:22:47 Curriculum learning in training
00:26:32 Distance measures in analysis
00:28:58 Model sizes and training dynamics
00:30:39 Future research directions
00:32:06 Conclusion and future topics
216 episoder
Semua episod
×

1 How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares) 1:16:55


1 Eiso Kant (CTO poolside) - Superhuman Coding Is Coming! 1:36:28


1 The Compendium - Connor Leahy and Gabriel Alfour 1:37:10


1 ARC Prize v2 Launch! (Francois Chollet and Mike Knoop) 54:15


1 Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman) 1:03:36


1 GSMSymbolic paper - Iman Mirzadeh (Apple) 1:11:23


1 Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere) 1:23:11


1 Tau Language: The Software Synthesis Future (sponsored) 1:41:19


1 John Palazza - Vice President of Global Sales @ CentML ( sponsored) 54:50


1 Transformers Need Glasses! - Federico Barbero 1:00:54


1 Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu 1:37:54


1 Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning? 51:26


1 Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning? 53:31


1 Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners 1:09:04


1 Sepp Hochreiter - LSTM: The Comeback Story? 1:07:01
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.