DAY 113 / 210
LLM Inference Fundamentals and Tradeoffs
This first day of phase-3 establishes the core mental model for production inference that every later optimization will build upon. Because Maku is building StartupTribunal, understanding latency, throughput, and cost at inference time directly determines whether the product can serve real users reliably.
⏱ 45 min target📝 2 quiz Qs
Resources
- 20 min
- 15 min
Deliverable
journal entry comparing inference latency and cost for a 7B model on two providers with concrete numbers for StartupTribunal workload
Quiz · 2 questions
1. Which factor most directly limits concurrent users in a naive transformer inference server?
2. List two concrete metrics you would track to decide whether to switch from API calls to self-hosted inference for StartupTribunal.