DAY 111 / 210

Foundations of LLM Inference Pipelines

This first day of phase-3 establishes core mental models for production inference before optimization layers are added. It directly precedes later days that will extend the existing app/maku routes into inference endpoints.

⏱ 40 min target📝 2 quiz Qs

Resources

readingHugging Face
Transformers Pipelines Documentation
entire page
25 min

Deliverable

Journal entry with 200-word summary of pipeline stages plus one concrete local inference test command that runs successfully

Quiz · 2 questions

1. Which component is responsible for converting raw text into token IDs before model forward pass?

TokenizerSchedulerSamplerKV cache

2. Name one common misconception when first running inference on consumer GPUs and state the actual limiting factor.

Journal

Time spent (minutes)

Blockers

Commit / PR links (one per line)