DAY 116 / 210
LLM Inference Fundamentals and Serving Basics
This opening day of phase-3 establishes the mental model for production inference that all later optimization work will build upon. It directly supports Maku's StartupTribunal goals by clarifying how model calls will be structured inside the existing Next.js API routes. Early clarity here prevents costly rework when real traffic patterns appear.
⏱ 40 min target📝 2 quiz Qs
Resources
- 25 min
Deliverable
journal entry with 3 concrete latency numbers measured against a local inference endpoint
Quiz · 2 questions
1. Which factor most directly increases p50 latency when batch size grows from 1 to 8?
2. Name one common misconception when developers first measure inference latency in a web route handler.