DAY 92 / 210
Foundations of Production LLM Inference
This opening day of phase-3 establishes why inference differs from training and why it dominates real-world costs. It creates the mental model needed before any optimization or serving work begins.
⏱ 45 min target📝 3 quiz Qs
Resources
- 25 min
Deliverable
Journal entry listing three inference bottlenecks observed in current app/maku routes plus one candidate fix
Quiz · 3 questions
1. Why is LLM inference typically memory-bound rather than compute-bound?
2. Name one concrete difference between training and inference memory access patterns.
3. How might the rate-limiter in lib/rate-limiter.ts interact with an inference queue under bursty traffic?