Why Evals Exist + Install Promptfoo
This day reframes TDD as one narrow tool inside a larger LLM evaluation portfolio for production systems like StartupTribunal. Installing promptfoo creates the immediate environment needed for tomorrow's hands-on work. It sets the measurement mindset that will later be applied to the pain-discovery graph nodes.
Resources
- 20 min
Codebase anchors
The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.
This file already encodes the exact day-1 topic and rationale; today's work validates and extends that arc definition with the first promptfoo installation step.
This orchestration layer calls the nodes whose behavior we will later measure with promptfoo evals, making it the closest existing code the new eval tooling will target.
1/**2 * @fileoverview Pain Discovery Graph (LangGraph wiring)3 *4 * Thin orchestration layer. Pure node + router logic lives in5 * `lib/graphs/pain-discovery-nodes.ts` so it can be unit-tested without6 * importing langgraph (whose ESM dist doesn't play with Jest's swc7 * transformer). This file is responsible only for:8 *9 * - Wiring the nodes into a LangGraph StateGraph10 * - Exposing `createPainDiscoveryGraph()` + `executePainDiscovery()`11 * - Re-exporting the node functions, routers, state types so existing12 * callers/tests that import from this module keep working13 *14 * Flow (unchanged by the May 2026 refactor; web search node is new):15 * START16 * → searchPrimary17 * signals? → selectSignal → END18 * empty? → searchAdjacent (×3 with broadening)19 * signals? → selectSignal → END20 * empty after 3? → webSearchFallback ← NEW Tier 221 * signals? → selectSignal → END22 * empty? → checkSynthetic23 * rate < 35% → generateSynthetic → selectSignal → END24 * rate ≥ 35% → END (blocked)25 */Deliverable
promptfoo installed locally with a minimal eval.yaml committed that references the pain-discovery graph entrypoint
Quiz · 2 questions
1. What is the primary reason to add promptfoo when TDD already exists in the codebase?
2. Name one concrete risk of shipping the pain-discovery graph without evals.