Player FM - Internet Radio Done Right
205 subscribers
Checked 1M ago
Tilføjet ten år siden
Indhold leveret af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !
Gå offline med appen Player FM !
180: Reinforcement Learning
Manage episode 471854552 series 70533
Indhold leveret af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- NASA has a list of 10 rules for software development
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- The Player of Games (Ian M Banks)
- Jason:
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Pokemon Sword and Shield
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- SARSA
- Q-Learning
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- Policy evaluation
- Two optimization loops
- RLHF & GRPO
181 episoder
Manage episode 471854552 series 70533
Indhold leveret af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- NASA has a list of 10 rules for software development
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- The Player of Games (Ian M Banks)
- Jason:
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Pokemon Sword and Shield
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- SARSA
- Q-Learning
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- Policy evaluation
- Two optimization loops
- RLHF & GRPO
181 episoder
Alle episoder
×Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.