DAY 80 / 210
LLM Inference Pipeline Fundamentals
This opening day of phase-3-inference establishes the core mechanics of model serving, token generation, and latency measurement that every later optimization will build upon. It directly supports Maku's work on StartupTribunal by grounding the API and rate-limiting patterns already present in the codebase.
⏱ 45 min target📝 2 quiz Qs
Resources
- 25 min
- 20 min
Deliverable
journal entry capturing first local inference benchmark and observed latency numbers
Quiz · 2 questions
1. Which factor most directly increases time-to-first-token in autoregressive decoding?
2. Explain why KV caching reduces per-token latency after the first token.