Gå offline med appen Player FM !
[QA] BALANCING PIPELINE PARALLELISM WITH VOCABULARY PARALLELISM
Manage episode 449599940 series 3524393
This paper addresses imbalanced computation and memory in pipeline parallelism for large language models by partitioning vocabulary layers, reducing communication barriers, and achieving improved throughput and memory balance.
https://arxiv.org/abs//2411.05288
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1677 episoder
Manage episode 449599940 series 3524393
This paper addresses imbalanced computation and memory in pipeline parallelism for large language models by partitioning vocabulary layers, reducing communication barriers, and achieving improved throughput and memory balance.
https://arxiv.org/abs//2411.05288
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1677 episoder
所有剧集
×Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.