DAY 93 / 210
Introduction to Efficient LLM Inference
Phase 3 shifts focus from training to serving models at scale. This day establishes core inference concepts so later optimizations can be measured against real bottlenecks in the existing Maku app stack.
⏱ 40 min target📝 2 quiz Qs
Resources
- 25 min
Deliverable
Journal entry listing three inference bottlenecks observed in app/maku/BriefForm.tsx and app/api/maku/brief/route.ts
Quiz · 2 questions
1. Which technique in vLLM primarily reduces memory fragmentation during LLM serving?
2. Name one key difference between continuous batching and static batching for inference throughput.