← Back to syllabus
Eval Discipline · Week 1 · Day 5/7
DAY 5 / 210

Add LLM-Rubric Assertions for Soft Properties

This day shifts evaluation from brittle string matching to scalable model-graded checks, directly enabling detection of marketing fluff and other soft attributes that literal assertions miss. It builds the foundation for trustworthy automated review in later phases of the eval arc.

45 min target📝 3 quiz Qs

Resources

Deliverable

Commit adding at least three new LLM-rubric assertions to the evaluation harness with passing test output

Quiz · 3 questions

1. Why do literal contains/equals checks fail for marketing-fluff detection?

2. Write a one-sentence model-graded rubric prompt that distinguishes substantive claims from marketing fluff.

3. What failure mode might arise if the judge LLM shares the same biases as the generator model?

Journal