← Back to syllabus
Inference Economics at Scale · Week 17 · Day 4/7
DAY 116 / 210

LLM Inference Fundamentals and Serving Basics

This opening day of phase-3 establishes the mental model for production inference that all later optimization work will build upon. It directly supports Maku's StartupTribunal goals by clarifying how model calls will be structured inside the existing Next.js API routes. Early clarity here prevents costly rework when real traffic patterns appear.

40 min target📝 2 quiz Qs

Resources

Deliverable

journal entry with 3 concrete latency numbers measured against a local inference endpoint

Quiz · 2 questions

1. Which factor most directly increases p50 latency when batch size grows from 1 to 8?

2. Name one common misconception when developers first measure inference latency in a web route handler.

Journal