← Back to syllabus
Inference Economics at Scale · Week 17 · Day 3/7
DAY 115 / 210

LLM Inference Fundamentals and Tradeoffs

This opening day of phase-3 establishes core mental models for production inference before optimization layers are added. It matters because every later technique in the arc (batching, quantization, serving engines) is measured against these baseline latency-throughput-memory constraints.

45 min target📝 3 quiz Qs

Resources

Deliverable

One-page journal entry listing the three primary inference metrics and one concrete tradeoff each introduces for the Maku brief endpoint

Quiz · 3 questions

1. Which metric is most directly increased by larger batch sizes during LLM inference?

2. Name one reason paged attention reduces memory fragmentation compared with naive KV caching.

3. For the current /api/maku/brief route, which single inference metric would you optimize first and why?

Journal