← Back to syllabus
Inference Economics at Scale · Week 14 · Day 2/7
DAY 93 / 210

Introduction to Efficient LLM Inference

Phase 3 shifts focus from training to serving models at scale. This day establishes core inference concepts so later optimizations can be measured against real bottlenecks in the existing Maku app stack.

40 min target📝 2 quiz Qs

Resources

Deliverable

Journal entry listing three inference bottlenecks observed in app/maku/BriefForm.tsx and app/api/maku/brief/route.ts

Quiz · 2 questions

1. Which technique in vLLM primarily reduces memory fragmentation during LLM serving?

2. Name one key difference between continuous batching and static batching for inference throughput.

Journal