← Back to syllabus
Inference Economics at Scale · Week 13 · Day 6/7
DAY 90 / 210

Core LLM Inference Patterns and Tradeoffs

This day launches phase-3-inference by grounding the learner in production serving fundamentals before any optimization work. It matters because inference is the moment StartupTribunal moves from model training to user-facing value, and the existing app structure must now be measured against real serving constraints.

45 min target📝 3 quiz Qs

Resources

Deliverable

300-word journal entry mapping inference latency/memory tradeoffs to the current Maku app routes

Quiz · 3 questions

1. Which factor most directly limits concurrent request throughput in a naive transformer pipeline?

2. Name one concrete downside of always using greedy decoding in a production chat endpoint.

3. How might the rate-limiter in the current codebase interact with an inference server that uses continuous batching?

Journal