Artwork

Indhold leveret af Turpentine. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Turpentine eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

Code Switching

1:22:20
 
Del
 

Manage episode 423123933 series 3572102
Indhold leveret af Turpentine. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Turpentine eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

In this episode of Emergent Behavior, @8teapi talks with Justin Junyang Lin, Chief Evangelist Officer of Alibaba Qwen Project. Joined by guest host Eugene Cheah, CEO of Recursal.AI, they talk about how Alibaba's Qwen 2 tackles multilingual challenges, including code-switching and the unique complexities of Chinese data.

🔥 Apply to join over 400 founders and Execs in the Turpentine Network: https://hmplogxqz0y.typeform.com/to/JCkphVqj

Explore the impact of open-source LLMs like Alibaba's Qwen 2, and how it's driving innovation in AI development.

--

RECOMMENDED PODCASTS:

🎙️ This Won't Last - Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics.

Apple Podcasts: https://podcasts.apple.com/us/podcast/id1765665937

Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz

YouTube: https://www.youtube.com/@ThisWontLastpodcast

FOLLOW ON X:

@8teAPi (Ate)

@JustinLin610 (Junyang)

@picocreator (Eugene)

@TurpentineMedia

--

LINKS:

Alibaba Qwen Project:

https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1

--

TIMESTAMPS:

(00:00) Introduction

(04:36) Qwen's Development Journey

(08:00) Data Curation & Coding Capabilities

(11:00) The Role of Evaluation

(14:00) Evolution of Pre-training and Evaluation

(17:00) Open Source vs. Commercial Groups

(22:00) Data Contamination

(24:00) Model Sizing and Computational Constraints

(28:00) Multi-lingual Capabilities

(31:00) Tokenizers and Language-Specific Considerations

(34:00) Code Switching and Data Filtering

(38:00) Code Switching, Dialects, and Model Size

(42:00) User Feedback and Model Development

(46:00) Challenges with Chinese Datasets

(52:00) Language Variation and Team Development

(58:00) Hiring and Team Dynamics

(1:03:00) Diversity and Production Considerations

(1:07:00) Production Impact and Collaboration

(1:13:00) Wrap

  continue reading

16 episoder

Artwork
iconDel
 
Manage episode 423123933 series 3572102
Indhold leveret af Turpentine. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Turpentine eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

In this episode of Emergent Behavior, @8teapi talks with Justin Junyang Lin, Chief Evangelist Officer of Alibaba Qwen Project. Joined by guest host Eugene Cheah, CEO of Recursal.AI, they talk about how Alibaba's Qwen 2 tackles multilingual challenges, including code-switching and the unique complexities of Chinese data.

🔥 Apply to join over 400 founders and Execs in the Turpentine Network: https://hmplogxqz0y.typeform.com/to/JCkphVqj

Explore the impact of open-source LLMs like Alibaba's Qwen 2, and how it's driving innovation in AI development.

--

RECOMMENDED PODCASTS:

🎙️ This Won't Last - Eavesdrop on Keith Rabois, Kevin Ryan, Logan Bartlett, and Zach Weinberg's monthly backchannel. They unpack their hottest takes on the future of tech, business, venture, investing, and politics.

Apple Podcasts: https://podcasts.apple.com/us/podcast/id1765665937

Spotify: https://open.spotify.com/show/2HwSNeVLL1MXy0RjFPyOSz

YouTube: https://www.youtube.com/@ThisWontLastpodcast

FOLLOW ON X:

@8teAPi (Ate)

@JustinLin610 (Junyang)

@picocreator (Eugene)

@TurpentineMedia

--

LINKS:

Alibaba Qwen Project:

https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1

--

TIMESTAMPS:

(00:00) Introduction

(04:36) Qwen's Development Journey

(08:00) Data Curation & Coding Capabilities

(11:00) The Role of Evaluation

(14:00) Evolution of Pre-training and Evaluation

(17:00) Open Source vs. Commercial Groups

(22:00) Data Contamination

(24:00) Model Sizing and Computational Constraints

(28:00) Multi-lingual Capabilities

(31:00) Tokenizers and Language-Specific Considerations

(34:00) Code Switching and Data Filtering

(38:00) Code Switching, Dialects, and Model Size

(42:00) User Feedback and Model Development

(46:00) Challenges with Chinese Datasets

(52:00) Language Variation and Team Development

(58:00) Hiring and Team Dynamics

(1:03:00) Diversity and Production Considerations

(1:07:00) Production Impact and Collaboration

(1:13:00) Wrap

  continue reading

16 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning