← Back to syllabus
Eval Discipline · Week 1 · Day 3/7
DAY 3 / 210

Designing your first eval set

Data shape precedes code; selecting representative rows forces explicit criteria before any automation. This day locks the foundation for all later triangulation and graph work by making the target distribution visible.

45 min target📝 2 quiz Qs🔗 2 code anchors

Resources

Codebase anchors

The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.

lib/maku/curriculum-arc.ts:L55audit-recent-catalog-country

this is the existing pattern for Week 1 first eval that you will now populate with the concrete 10-row dataset

⚠️ This anchor needs updating — the file or line is no longer reachable.
lib/maku/curriculum-arc.ts:L65audit-recent-catalog-country

this is the closest existing usage of eval-related milestones we will measure the new 10-row set against

⚠️ This anchor needs updating — the file or line is no longer reachable.

Deliverable

10-row eval set draft saved as eval-set.json in branch eval-set-v1

Quiz · 2 questions

1. Why select rows before writing any scorer code?

2. List two concrete signals that would make a StartupTribunal row representative of high-trust pain.

Journal