Loading…
Wednesday February 5, 2025 2:20pm - 2:45pm GMT
Many orgs are evaluating running open source LLMs on their own infrastructure, and Kubernetes is a natural platform choice. However, running open source LLMs in production on Kubernetes is, honestly, a bit of an undocumented mess.

This technical presentation shares the experience of both speakers in deploying production-grade LLM infrastructure on Kubernetes. Through practical demonstrations, we'll explore the complete deployment lifecycle, from GPU setup to optimization techniques like Flash Attention, quantization tradeoffs and GPU sharing.

You'll learn:

* Architectural patterns for efficient LLM deployment using Ollama and vLLM
* Solutions for model weight management and context length optimization
* Techniques for GPU sharing and improving resource utilization
* Production approaches to fine-tuning with Axolotl and serving multiple models with LoRAX

You'll leave with a complete blueprint for building reliable, scalable LLM infrastructure on Kubernetes.
Speakers
avatar for Luke Marsden

Luke Marsden

Founder, MLOps Consulting
Technical leader and startup founder who participated in the early development of Docker and Kubernetes. Former SIG lead for SIG-cluster-lifecycle.
avatar for Priya Samuel

Priya Samuel

Full stack engineer, Software Architect, Elsevier
Priya Samuel is a seasoned technology leader with a passion for transforming complex challenges into actionable solutions. With extensive expertise in DevOps, and cloud-native technologies, and Identity and Access Management (IAM). Priya has helped organizations scale their data and... Read More →
Wednesday February 5, 2025 2:20pm - 2:45pm GMT
Grand Hall 2

Attendees (2)


Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link