Artwork

Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

Cloud to the Edge: Future of LLMs w/ Mahesh Yadav of Google

59:48
 
Del
 

Manage episode 441634742 series 3574631
Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Send us a text

Curious about how you can run a colossal 405 billion parameter model on a device with a mere 2 billion footprint? Join us with Mahesh Yadav from Google, as he shares his journey from developing small devices to working with massive language models. Mahesh reveals the groundbreaking possibilities of operating large models on minimal hardware, making internet-free, edge AI a reality even on devices as small as a smartwatch. This eye-opening discussion is packed with insights into the future of AI and edge computing that you don't want to miss.
Explore the strategic shifts by tech giants in the language model arena with Mahesh and our hosts. We dissect Microsoft's investment in OpenAI’s Phi model and Google's development of Gamma, exploring how increasing the parameters in large language models leads to emergent behaviors like logical reasoning and translation. Delving into the technical and financial implications of these advancements, we also address privacy concerns and the critical need for cost-effective model optimization in enterprise environments handling sensitive data.
Advancements in edge AI training take center stage as Mahesh unpacks the latest techniques for model size reduction. Learn about synthetic data generation and the use of quantization, pruning, and distillation to shrink models without losing accuracy. Mahesh also highlights practical applications of small language models in enterprise settings, from contract management to sentiment analysis, and discusses the challenges of deploying these models on edge devices. Tune in to discover cutting-edge strategies for model compression and adaptation, and how startups are leveraging base models with specialized adapters to revolutionize the AI landscape.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Cloud to the Edge: Future of LLMs w/ Mahesh Yadav of Google (00:00:00)

2. Edge AI Development and Challenges (00:00:37)

3. Edge AI With Small Language Models (00:13:43)

4. Advancements in Edge AI Training (00:22:53)

5. Techniques for Model Size Reduction (00:27:15)

6. Applications of Small Language Models (00:37:40)

7. Discussion on NVIDIA, ONNX, and Acceleration (00:41:05)

8. Model Compression and Adaptation Techniques (00:53:57)

24 episoder

Artwork
iconDel
 
Manage episode 441634742 series 3574631
Indhold leveret af EDGE AI FOUNDATION. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af EDGE AI FOUNDATION eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Send us a text

Curious about how you can run a colossal 405 billion parameter model on a device with a mere 2 billion footprint? Join us with Mahesh Yadav from Google, as he shares his journey from developing small devices to working with massive language models. Mahesh reveals the groundbreaking possibilities of operating large models on minimal hardware, making internet-free, edge AI a reality even on devices as small as a smartwatch. This eye-opening discussion is packed with insights into the future of AI and edge computing that you don't want to miss.
Explore the strategic shifts by tech giants in the language model arena with Mahesh and our hosts. We dissect Microsoft's investment in OpenAI’s Phi model and Google's development of Gamma, exploring how increasing the parameters in large language models leads to emergent behaviors like logical reasoning and translation. Delving into the technical and financial implications of these advancements, we also address privacy concerns and the critical need for cost-effective model optimization in enterprise environments handling sensitive data.
Advancements in edge AI training take center stage as Mahesh unpacks the latest techniques for model size reduction. Learn about synthetic data generation and the use of quantization, pruning, and distillation to shrink models without losing accuracy. Mahesh also highlights practical applications of small language models in enterprise settings, from contract management to sentiment analysis, and discusses the challenges of deploying these models on edge devices. Tune in to discover cutting-edge strategies for model compression and adaptation, and how startups are leveraging base models with specialized adapters to revolutionize the AI landscape.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Cloud to the Edge: Future of LLMs w/ Mahesh Yadav of Google (00:00:00)

2. Edge AI Development and Challenges (00:00:37)

3. Edge AI With Small Language Models (00:13:43)

4. Advancements in Edge AI Training (00:22:53)

5. Techniques for Model Size Reduction (00:27:15)

6. Applications of Small Language Models (00:37:40)

7. Discussion on NVIDIA, ONNX, and Acceleration (00:41:05)

8. Model Compression and Adaptation Techniques (00:53:57)

24 episoder

Alle episoder

×
 
Send us a text What do rising pedestrian fatalities and the challenges of implementing intelligent transportation systems have in common? In this episode, we’re joined by Davis Sawyer from NXP and Mike "Witt" Whitaker from the City of Lakewood, Colorado, to unpack the complexities of applying edge AI to enhance urban safety. As Witt recounts his pioneering work with autonomous vehicles since the 1990s, we explore the fast-paced evolution of AI technology and its critical role in addressing real-world problems, such as the increasing need for pedestrian safety and efficient traffic management. We take a hard look at the perplexing trend of fatal crashes involving impaired drivers despite a drop in overall traffic incidents. The conversation intensifies as we discuss the doubling of pedestrian fatalities, particularly at night, and the significant challenges posed by underreported impairment and the rise of fentanyl use. Our guests share insights into how collaborations with the police department and meticulous data analysis can reveal hidden patterns, driving the development of more effective municipal safety strategies. Our exploration doesn’t stop there. We also discuss the advancements in detection technologies and the iterative process of refining systems like LiDAR and video to predict pedestrian movements more accurately. This episode offers a deep dive into the complexities and costs of deploying AI models in urban environments, including the importance of ML Ops, the challenge of model retraining, and the role of existing infrastructure in supporting intelligent systems. Join us as we navigate the intricate world of smart city solutions, focusing on the shared goal of safer streets for everyone. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Discover the cutting-edge world of AI deployment on edge devices with insights from top experts, including Dr. KC Liu. This episode promises to unravel the complexities of optimizing AI models for devices where memory and computing power are limited. We explore the critical role of model compilers and the innovative strides being made in AIoT sensors, as Matteo Maravita from STMicroelectronics offers an exciting glimpse into the future of machine learning integration into MEMS sensors. Join us as we tackle the pressing need for standardization in the fragmented IoT development landscape. Our esteemed panel delves into the challenges developers face with diverse proprietary technologies from giants like STM, NXP, and Renesas. Hear about potential convergence through model zoos and frameworks, and the unique role of MLPerf Tiny ML in benchmarking AI applications specifically designed for edge devices. Our conversation shines a light on the balance between utilizing common tools and proprietary compilers for optimized performance on specific hardware. Lastly, explore the promising avenues of AI in the realm of robotics and the innovative strategies shaping the future of AI systems. Learn about the layered architecture approach dividing AI systems into sensor network, edge AI, and cloud computing layers, and the potential for a sustainable AI ecosystem through collaboration. With a focus on benchmarking advancements and MPU design strategies, discover how AI integration with numerous sensors could redefine possibilities in the robotics field. This episode is a compelling journey through the landscape of AI technologies, emphasizing collaboration and innovation for next-generation AI products. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the secrets of energy-efficient AI as we explore the groundbreaking fusion of Edge AI and generative AI in our latest episode. With expert insights from Victor Jung, a trailblazer in the field, discover how foundational models can be deployed on tiny and embedded systems to revolutionize devices like AR glasses and nanodrones. Listen as we unravel the complexities of deploying neural networks on microcontrollers, with a focus on powerful techniques like quantization, graph lowering, and innovative memory management strategies. Victor guides us through the nuanced process of deploying neural networks, highlighting critical stages like graph lowering and memory allocation. Traverse the intricate front-end and mid-end stages where neural network graphs are optimized, ensuring peak performance on specific hardware platforms. We'll illustrate the importance of efficient memory usage through a fascinating example involving a tiny language model on the Syracuse platform, showcasing the role of quantization and memory management tailored for hardware constraints. Dive into the future of AI deployment on edge devices with a focus on quantization and hardware support. From exploring the potential of foundation models like DenoV2 to discussing the emerging micro scaling format, we uncover the technologies that are making AI more energy-efficient and versatile. Our conversation underscores the importance of viewing memory as a compute asset and the need for ongoing research to enhance system efficiency for generative AI at the edge. Join us for an enlightening episode that highlights the vital steps needed to optimize memory and computing resources for meaningful applications on small platforms. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Join us for an insightful conversation with Dr. Vijay Janapa Reddi from Harvard, who takes us on a journey through the evolving world of edge computing and machine learning education. Discover how his family's affinity for East Coast seasons inspired his academic path, and learn about his groundbreaking open-source book on machine learning systems. This episode celebrates the rebranding of the tinyML Foundation to the Edge AI Foundation, reflecting an expanded focus that transcends embedded devices. We tackle the complexities of machine learning education, where excitement often masks the real challenges of developing robust AI systems. Dr. Janapa Reddi’s vision for his work-in-progress book draws from his teaching experiences at Harvard, aiming to universalize machine learning system principles akin to core concepts in operating systems. We explore the challenge of crafting educational materials that cater to both beginners and seasoned professionals, providing a roadmap to guide diverse audiences. The discussion highlights the importance of hands-on learning, especially in data collection and lab work, with contributions from notable figures like Marcelo Rovai in the tinyML space, emphasizing practical applications for edge AI systems. In a fascinating discussion on data science versus data engineering, Dr. Janapa Reddi elucidates the foundational role of data engineering in successful machine learning projects. We also delve into microcontroller programmability and the integration of frameworks like TensorFlow and PyTorch into curriculums. As we explore the role of AI tools like ChatGPT in programming, the conversation shifts to the exciting potential of AI-powered educational assistants, transforming the future of learning through interactive and personalized experiences. Whether you're a student, educator, or industry professional, this episode offers a wealth of insights into the intersection of AI, education, and the future of edge technologies. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Get ready for the recap of the Beyond Chatbots - The Journey of Generative AI to the Edge - exploring the frontier of Generative AI and edge technology with Davis Sawyer, Danilo Pau and Pete Bernard as they discuss the dynamic innovations transforming these fields. We uncover the fascinating shift from cloud-based proof-of-concepts to the compelling reality of edge solutions. This conversation dives into the ever-evolving landscape of generative AI, particularly in edge environments where cost and power efficiency reign supreme. From automotive breakthroughs to advanced memory optimization techniques, discover how these innovations are redefining the role of AI in our world. Listen in as we dissect the gravitational pull towards edge solutions with insights from recent industry research and captivating discussions. Dave shares intriguing observations on AI as crucial infrastructure, likening it to essential utilities that promise lasting impacts across sectors. As we look ahead, the conversation turns to the future of AI in manufacturing and the exciting potential for AGI-capable equipment. With Foundation partners like STMicroelectronics and NXP at the helm, the potential for edge AI advancements is limitless. Join us for an engaging exploration of the trends shaping the future of AI and edge computing. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the secrets of neural style transfer on microcontrollers with our special guest, Alberto Ancilotto of FBK, as he explores a groundbreaking approach to image generation on low-power devices. Discover how this innovative technique allows us to combine the content of one image with the artistic style of another, transforming simple visuals into unique masterpieces—like turning a regular cat photo into a Van Gogh-inspired work of art. Alberto introduces Xinet, a cutting-edge convolutional neural network designed to perform these creative tasks efficiently on embedded platforms. Gain insight into the process of optimizing performance by evaluating CNN operators for energy efficiency and adapting networks for a variety of devices, from the smallest microcontrollers to advanced TPUs and accelerators. We dive deep into the collaboration between Clip and style transfer networks, enhancing the precision of semantic representation in generated images. Witness the impressive capabilities of this technology through real-world examples, such as generating images in just 60 milliseconds on the STM32N6 microcontroller. Experience the advanced applications in video anonymization, where style transfer provides a superior alternative to traditional blurring methods, altering appearances while maintaining action consistency. Alberto also addresses the broader implications of anonymization technology in public spaces, including privacy protection and GDPR compliance, while maintaining artistic integrity. Join us as we tackle audience questions about model parameters, deployment flexibility, and the exciting potential of this technology across various sectors. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text What if AI could transform the landscape of software development, making traditional methods seem like relics of the past? Join us as we sit down with a visionary guest from Wipro Engineering to uncover how GenAI-based custom code systems are revolutionizing the future of AI PCs. We promise a deep dive into how these intelligent tools are not just enhancing productivity but are redefining the very essence of software creation. Discover why the shift from cloud-based to edge-based solutions is imperative for enterprise developers dealing with proprietary codebases, and explore the power of open-source models like Code Llama, integrated through advanced frameworks such as LanChain and Lama Index. This episode promises to unravel the intricacies of an innovative web-based application designed to boost performance and user efficiency. Our conversation also shines a light on the tangible productivity gains brought by cutting-edge code assistance tools, effectively reducing weeks of work into days. We explore the vital role of hardware—like RAM and discrete GPUs—in maximizing these tools' potential. Discussions reveal varying levels of acceptance among developers, with younger professionals more readily embracing the shift. To bridge this gap, we recommend intensive training and boot camps as pathways to wider adoption. Furthermore, the potential of generative AI in deciphering and documenting legacy code is highlighted, offering a glimpse into how these advancements are reshaping the programming landscape. Engage with us to understand the profound impact of these innovations on the future of software development. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient. Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the future of automotive AI with insights from Ashutosh and Guru Prashad of BOSS India, as they unravel the transformative potential of Small Language Models (SLMs) in vehicles. Discover how these compact powerhouses are reshaping the landscape of vehicular edge systems by offering customization and performance reliability, all while keeping data privacy at the forefront. Dive into the captivating interplay between SLMs and their larger counterparts, Large Language Models (LLMs), and learn how they together address domain-specific tasks and complex computations in the cloud. This episode promises to equip you with a deeper understanding of why SLMs are the secret sauce for advancing automotive AI in this data-sensitive era. We also spotlight the remarkable optimization journey of the Tiny Llama 1.1 model. Learn about the fine-tuning process that brought about an astounding 700% improvement in throughput and a drastic reduction in model size. Uncover the fascinating world of edge deployment using devices like Pi 4B and Jetson Oren Nano, and explore the audio and chat functionalities that are setting new standards in vehicular AI. Finally, imagine the future of personalized interactions in cars, where generative AI transforms the way we communicate and engage with our vehicles and surroundings. This episode is a treasure trove of forward-thinking solutions and innovative ideas, perfect for anyone eager to explore the cutting edge of automotive AI. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Discover how the EDGE AI FOUNDATION is evolving from its roots in the tinyML Foundation to becoming a central hub for innovation and collaboration. Learn how initiatives like EDGE AI Labs and the EDGE AIP program are bridging the gap between academia and industry, training future AI leaders while tackling the ethical challenges of responsible AI development. Explore the transformative potential of generative AI on edge devices, from providing vital healthcare diagnostics in remote areas to enabling adaptive robotics in factories. We'll highlight compelling reasons for companies to engage with the Edge AI Foundation, offering unparalleled access to cutting-edge research, top talent, and a voice in shaping the industry's future. As we navigate through real-life scenarios and ethical considerations, you’ll see why the urgency and opportunity surrounding Edge AI is something you don't want to miss. Join us on this journey to ensure the benefits of AI are shared widely and responsibly by visiting edgeaifoundation.org. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the secrets to deploying machine learning models on edge devices with Chen Lai from the PyTorch Edge team at Meta. Discover how XTorch, a brainchild of the PyTorch team, is transforming edge deployment by addressing challenges like memory constraints and hardware diversity. Get an insider's view on the technical collaborations with tech giants like Apple, Arm, Qualcomm, and MediaTek, which are revolutionizing the deployment of advanced language models like LLAMA on platforms such as iOS and Android. With Chen's expert insights, explore the fascinating process of converting PyTorch models into executable programs optimized for performance, stability, and broad hardware compatibility, ensuring seamless integration from server to edge environments. Immerse yourself in the world of XTorch within the Red Bull Ecosystem, where deploying machine learning models becomes effortless even without extensive hardware knowledge. Learn how key components like Torchexport and Torchio capture compute graphs and support quantization, elevating edge deployment capabilities. Discover how Torchchat facilitates large language model inference on various devices, ensuring compatibility with popular models from Hugging Face. As we wrap up, hear about the community impact of Meta's Executorch initiative, showcasing a commitment to innovation and collaboration. Chen shares his passion and dedication to advancing edge computing, leaving a lasting impression on listeners eager for the next wave of technological breakthroughs. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the future of TinyML by learning how to harness the power of large language models, as we sit down with Roberto Morabito to dissect this intriguing technological convergence. Discover how the collaborative efforts with Eurocom and the University of Helsinki are shaping a groundbreaking framework designed to elevate TinyML's lifecycle management. We promise to unravel the complexities and opportunities that stem from integrating these technologies, focusing on the essential role of prompt templates and the dynamic challenges posed by hardware constraints. Through a proof-of-concept demonstration, we bring you invaluable insights into resource consumption, potential bottlenecks, and the exciting prospect of automating lifecycle stages. Our conversation ventures into optimizing language models for end devices, delving into the transformative potential of Arduinos and single-board computers in enhancing efficiency and slashing costs. Roberto shares his expertise on the nuances of model conversion across varying hardware capabilities, revealing the impact this has on success rates. The episode crescendos with a compelling discussion on automating industrial time series forecasting, underscoring the critical need for adaptive solutions to maintain accuracy and efficiency. Through Roberto's expert insights, listeners are invited to explore the forefront of technology that is poised to revolutionize industrial applications. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the transformative potential of edge computing with the insights of industry experts Dave McCarthy from IDC and Pete Bernard. Ever wondered how advanced transformer models are catalyzing technological leaps at the edge? This episode promises to enlighten you on the nuances of AI-ready infrastructure, pushing the boundaries of autonomous operations in multi-cloud and multi-edge environments. With an emphasis on trust, security, and sustainability, our guests illuminate the strategic importance of optimizing edge designs and the benefits of hybrid and multi-cloud strategies. Explore the dynamic world of Edge AI as we dissect the complexities of heavy and light edge scenarios, particularly within industrial contexts. Dave and Pete help navigate the shift from centralized systems to the cutting-edge distributed frameworks necessary for processing the explosion of data generated outside traditional data centers. Discover how Edge AI and TinyML are reshaping industries by empowering smarter devices and solutions, pushing AI workloads from the cloud to resource-constrained environments for improved efficiency and real-time data processing. Dive into the fascinating migration of AI workloads from the cloud to the edge, driven by the demands of smart cities and critical infrastructure. Our experts share insights from global surveys, examining how inference is increasingly shifting to the edge, while training remains cloud-based. Listen in as we explore the evolving edge AI hardware landscape, cost-effective solutions, and the burgeoning interest in specialized models. Uncover emerging generative AI use cases poised to revolutionize various sectors, and gain a glimpse into the future opportunities and challenges in the ever-evolving landscape of edge AI. Join us for a riveting discussion that promises to leave you informed and inspired. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Generative AI is poised to transform the edge, but what does this mean for technology, innovation, and everyday life? Join us for an enlightening discussion led by Danilo Pau, featuring a distinguished panel of experts including Dave, Roberto Chen, Alok, Seung-Yang, Arniban, and Alberto. They promise to unravel the mysteries beyond large language models and chatbots, offering fresh insights into the interplay of algorithms, software, chips, and methodologies from an EDA perspective. The conversation aims to democratize generative AI, making its groundbreaking potential accessible to all, and sparking inspiration throughout the tiny ML community. With gratitude to the TinyML Foundation for their invaluable support, this episode builds on the momentum of previous forums. Reflecting on the foundation laid by Davis Sawyer's inspiring March session, we explore how generative AI is not just a cloud-based innovation but a technology set to revolutionize the edge. Expect to hear how these developments could impact everyone's work and life, and gain a glimpse into the collective vision for the future from our esteemed speakers. Don’t miss the chance to be part of this vibrant exchange, where innovation is not just discussed but propelled forward. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Send us a text Unlock the secrets to revolutionary weather forecasting with our latest episode featuring Jonah Beysens, the brilliant mind behind the Aurora project. Imagine a world where remote and resource-limited areas can access reliable weather data without the hassle of maintaining traditional mechanical stations. Learn how the Aurora, an acoustic smart weather station, is making this possible. Born from the TinyML Challenge 2022, Aurora employs a TinyML board with a microphone to classify wind and rain intensity, ensuring robustness and ease of deployment. Jonah walks us through the journey of crafting a real-world dataset and shares the collaborative spirit behind making this data open for community-driven innovation. Discover how the shift towards low-cost, acoustics-based devices is reshaping weather forecasting, offering enhanced spatial resolution with multiple ground stations. Jonah sheds light on the collaborative efforts to refine prediction models with open datasets, emphasizing the profound global impact, especially in developing nations where agriculture depends heavily on accurate forecasts. As we discuss the ongoing work to make weather data more accessible worldwide, we highlight the role of community and open access in driving forward weather-related technologies. Join us in exploring how these innovative solutions promise timely, dependable forecasts, paving the way for a future where environmental data is a shared resource for all. Support the show Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org…
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning

Lyt til dette show, mens du udforsker
Afspil