← Back to syllabus
Inference Economics at Scale · Week 18 · Day 3/7
DAY 122 / 210

Foundations of LLM Inference Pipelines

Phase 3 shifts focus from training to serving models at scale; this day establishes core inference concepts so later optimization work has a measurable baseline. Understanding pipelines early prevents downstream bottlenecks when integrating inference into StartupTribunal workflows. The day matters because inference latency and cost directly determine product viability for real users.

45 min target📝 3 quiz Qs

Resources

Deliverable

journal entry in app/maku/BriefForm.tsx documenting first inference latency measurement on a local model

Quiz · 3 questions

1. Which component most directly controls batching behavior during inference?

2. Why might increasing batch size reduce latency up to a point but then increase it?

3. Describe one concrete change you would make to the current brief submission flow if inference latency exceeded 2 s.

Journal