Reward Models | Data Brew | Episode 40 Data Brew By Databricks podcast

Artwork

Databricks Data Analytics Apache Spark Delta Lake Machine Learning Data Engineering Artificial Intelligence Tech Data Science Science Lifestyle Podcasting Education

Indhold leveret af Databricks. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Databricks eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Data Brew by Databricks « »
Reward Models | Data Brew | Episode 40

1M ago 39:58

Del

MP3•Episode hjem

Indhold leveret af Databricks. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Databricks eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

… continue reading

42 episoder

#Databricks #Data Analytics #Apache Spark #Delta Lake #Machine Learning #Data Engineering #Artificial Intelligence #Tech #Data Science #Science #Lifestyle #Podcasting Education

Artwork

Reward Models | Data Brew | Episode 40

Data Brew by Databricks

71 subscribers

published 1M ago

Del

MP3•Episode hjem

Indhold leveret af Databricks. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Databricks eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

… continue reading

42 episoder

#Databricks #Data Analytics #Apache Spark #Delta Lake #Machine Learning #Data Engineering #Artificial Intelligence #Tech #Data Science #Science #Lifestyle #Podcasting Education

Alle episoder

×

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Lyt til 500+ emner

Lyt til dette show, mens du udforsker