DAY 79 / 210
Core Concepts of LLM Inference Serving
This day opens phase-3 by grounding learners in production inference realities rather than training. It directly supports Maku's StartupTribunal work by clarifying how model outputs reach users at scale. The focus on measurable trade-offs prevents common over-optimism about raw model quality alone.
⏱ 45 min target📝 2 quiz Qs
Resources
- 25 min
Deliverable
300-word journal entry on inference metrics relevant to StartupTribunal
Quiz · 2 questions
1. Which factor most directly limits throughput when batch size increases?
2. Explain in two sentences why latency and throughput are not always improved by the same technique.