DAY 90 / 210
Core LLM Inference Patterns and Tradeoffs
This day launches phase-3-inference by grounding the learner in production serving fundamentals before any optimization work. It matters because inference is the moment StartupTribunal moves from model training to user-facing value, and the existing app structure must now be measured against real serving constraints.
⏱ 45 min target📝 3 quiz Qs
Resources
- 25 min
- 15 min
Deliverable
300-word journal entry mapping inference latency/memory tradeoffs to the current Maku app routes
Quiz · 3 questions
1. Which factor most directly limits concurrent request throughput in a naive transformer pipeline?
2. Name one concrete downside of always using greedy decoding in a production chat endpoint.
3. How might the rate-limiter in the current codebase interact with an inference server that uses continuous batching?