← Back to syllabus
Inference Economics at Scale · Week 12 · Day 3/7
DAY 80 / 210

LLM Inference Pipeline Fundamentals

This opening day of phase-3-inference establishes the core mechanics of model serving, token generation, and latency measurement that every later optimization will build upon. It directly supports Maku's work on StartupTribunal by grounding the API and rate-limiting patterns already present in the codebase.

45 min target📝 2 quiz Qs

Resources

Deliverable

journal entry capturing first local inference benchmark and observed latency numbers

Quiz · 2 questions

1. Which factor most directly increases time-to-first-token in autoregressive decoding?

2. Explain why KV caching reduces per-token latency after the first token.

Journal