Artwork

Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University

30:29
 
Del
 

Manage episode 453994436 series 3574631
Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

36 episoder

Artwork
iconDel
 
Manage episode 453994436 series 3574631
Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

36 episoder

Tüm bölümler

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning

Lyt til dette show, mens du udforsker
Afspil