← Back to syllabus
Inference Economics at Scale · Week 16 · Day 4/7
DAY 109 / 210

LLM Inference Fundamentals and Latency

This first day of phase-3 establishes the core mental model for production inference before optimization work begins. Understanding token generation loops and KV-cache mechanics explains why later days target throughput and cost. The day surfaces common misconceptions about batching versus latency that appear in real deployment reviews.

50 min target📝 3 quiz Qs

Resources

Deliverable

Journal entry with 150-word summary of KV-cache role plus one latency bottleneck identified from app/maku/page.tsx

Quiz · 3 questions

1. Which component most directly reduces recomputation during autoregressive generation?

2. Explain in one sentence why increasing batch size can increase tail latency even when throughput rises.

3. Describe a scenario from the current Maku codebase where inference latency would be mis-measured if only average tokens-per-second is tracked.

Journal