← Back to syllabus
Eval Discipline · Week 1 · Day 1/7
DAY 1 / 210

Why Evals Exist + Install Promptfoo

This day reframes TDD as one narrow tool inside a larger LLM evaluation portfolio for production systems like StartupTribunal. Installing promptfoo creates the immediate environment needed for tomorrow's hands-on work. It sets the measurement mindset that will later be applied to the pain-discovery graph nodes.

45 min target📝 2 quiz Qs🔗 2 code anchors

Resources

  • readingpromptfoo
    Getting Started

    full intro and quickstart

    20 min
  • docs

    README and examples folder

    15 min

Codebase anchors

The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.

lib/maku/curriculum-arc.ts:L53mentionsTargetCountry

This file already encodes the exact day-1 topic and rationale; today's work validates and extends that arc definition with the first promptfoo installation step.

⚠️ This anchor needs updating — the file or line is no longer reachable.
lib/graphs/pain-discovery-graph.ts:L5pain-discovery-nodes

This orchestration layer calls the nodes whose behavior we will later measure with promptfoo evals, making it the closest existing code the new eval tooling will target.

1/**
2 * @fileoverview Pain Discovery Graph (LangGraph wiring)
3 *
4 * Thin orchestration layer. Pure node + router logic lives in
5 * `lib/graphs/pain-discovery-nodes.ts` so it can be unit-tested without
6 * importing langgraph (whose ESM dist doesn't play with Jest's swc
7 * transformer). This file is responsible only for:
8 *
9 * - Wiring the nodes into a LangGraph StateGraph
10 * - Exposing `createPainDiscoveryGraph()` + `executePainDiscovery()`
11 * - Re-exporting the node functions, routers, state types so existing
12 * callers/tests that import from this module keep working
13 *
14 * Flow (unchanged by the May 2026 refactor; web search node is new):
15 * START
16 * → searchPrimary
17 * signals? → selectSignal → END
18 * empty? → searchAdjacent (×3 with broadening)
19 * signals? → selectSignal → END
20 * empty after 3? → webSearchFallback ← NEW Tier 2
21 * signals? → selectSignal → END
22 * empty? → checkSynthetic
23 * rate < 35% → generateSynthetic → selectSignal → END
24 * rate ≥ 35% → END (blocked)
25 */

Deliverable

promptfoo installed locally with a minimal eval.yaml committed that references the pain-discovery graph entrypoint

Quiz · 2 questions

1. What is the primary reason to add promptfoo when TDD already exists in the codebase?

2. Name one concrete risk of shipping the pain-discovery graph without evals.

Journal