DAY 3 / 210
Designing your first eval set
Data shape precedes code; selecting representative rows forces explicit criteria before any automation. This day locks the foundation for all later triangulation and graph work by making the target distribution visible.
⏱ 45 min target📝 2 quiz Qs🔗 2 code anchors
Resources
- 20 min
Codebase anchors
The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.
this is the existing pattern for Week 1 first eval that you will now populate with the concrete 10-row dataset
⚠️ This anchor needs updating — the file or line is no longer reachable.
Deliverable
10-row eval set draft saved as eval-set.json in branch eval-set-v1
Quiz · 2 questions
1. Why select rows before writing any scorer code?
2. List two concrete signals that would make a StartupTribunal row representative of high-trust pain.