180: Reinforcement Learning Programming Throwdown podcast

Player FM - Internet Radio Done Right

205 subscribers

Programming

Tilføjet ten år siden

Indhold leveret af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

State Secrets: Inside The Making Of The Electric State

1
The Secret To Getting Inspired: Millie Bobby Brown & Chris Pratt Go Behind The Scenes 21:04

for 6 weeks siden21:04

Afspil senere

Lister

Liked

21:04

Step into the mysterious and visually stunning world of The Electric State as host Francesca Amiker takes you behind the scenes with the creative masterminds who brought Simon Stålenhag’s dystopian vision to life. In this premiere episode, directors Joe and Anthony Russo, stars Millie Bobby Brown and Chris Pratt, writers Christopher Markus and Stephen McFeely, and producers Angela Russo-Otstot and Chris Castaldi reveal how they transformed a haunting graphic novel into an epic cinematic experience. Watch The Electric State coming to Netflix on March 14th. Check out more from Netflix Podcasts . State Secrets: Inside the Making of The Electric State is produced by Netflix and Treefort Media.…

Programming Throwdown »
180: Reinforcement Learning

for ca. et år siden 1:52:22

Del

MP3•Episode hjem

Intro topic: Grills

News/Links:

You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre

Book of the Show

Patrick:
- The Player of Games (Ian M Banks)
  - https://a.co/d/1ZpUhGl (non-affiliate)
Jason:
- Basic Roleplaying Universal Game Engine
  - https://amzn.to/3ES4p5i

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Patrick:
- Pokemon Sword and Shield
Jason:
- Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning

Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Online vs Offline RL
Optimization algorithms
- Value optimization
  - SARSA
  - Q-Learning
- Policy optimization
  - Policy Gradients
  - Actor-Critic
  - Proximal Policy Optimization
Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
Policy Evaluation
- Propensity scoring versus model-based
Challenges to training RL model
- Two optimization loops
  - Collecting feedback vs updating the model
- Difficult optimization target
  - Policy evaluation
RLHF & GRPO

★ Support this podcast on Patreon ★

181 episoder

#Programming #Software Development #Java #Python #Patrick Wheeler and Jason Gauci #Jason Gauci #Patrick Wheeler #Tech #Podcasting Education #News #Tech News #Programming Language #Objective-c

Programming Throwdown