← Back to syllabus
Eval Discipline · Week 4 · Day 5/7
DAY 26 / 210

Chase next edge-case class in eval surface

This free ship day forces systematic exploration of the current eval harness to surface the next coherent failure class before Phase 2 begins. Identifying and naming that class now prevents diffuse bug-chasing later and directly informs the next gated improvement.

45 min target📝 3 quiz Qs🔗 3 code anchors

Resources

Codebase anchors

The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.

lib/graphs/pain-discovery-nodes.ts:L399mentionsTargetCountry

This is the existing pattern you will measure against while hunting for new edge-case classes that slip past country targeting.

379 * (AE). Without this rule the gate is symmetric to any country
380 * named ANYWHERE in title+snippet, which is too permissive when
381 * Serper returns global results with a passing mention of the
382 * target.
383 *
384 * 3. Title is location-neutral (no country keywords at all) →
385 * fall back to the original snippet check. This preserves the
386 * legacy behavior for cases like "Solo founders burn out scaling
387 * product" where the country only appears in the body.
388 *
389 * False-positive risk: a curated REGION_META keyword might overlap
390 * with an unrelated place (e.g. "Lagos, Portugal" → matches NG keyword
391 * "Lagos"). Acceptable: REGION_META is hand-curated for major
392 * cities/capitals, overlaps are rare, and rule 1 always wins when the
393 * target keyword IS in the title.
394 *
395 * False-negative risk: a title legitimately mentioning two countries
396 * ("Kenya imports power from Ethiopia" with target=KE) → rule 1 fires
397 * first on "Kenya", we never reach rule 2. Safe.
398 */
399export function mentionsTargetCountry(signal: PainSignal, region: Region): boolean {
400 const meta = REGION_META[region];
401 if (!meta || meta.keywords.length === 0) return true; // no metadata → don't gate
402 const titleLc = signal.title.toLowerCase();
403 const textLc = signal.text.toLowerCase();
404
405 // Rule 1: target mentioned in title → strong accept
406 const targetInTitle = meta.keywords.some((kw) => titleLc.includes(kw.toLowerCase()));
407 if (targetInTitle) return true;
408
409 // Rule 2: competing country in title (without target) → strong reject
410 for (const [otherCode, otherMeta] of Object.entries(REGION_META) as Array<
411 [Region, (typeof REGION_META)[Region]]
412 >) {
413 if (otherCode === region) continue;
414 if (otherMeta.keywords.some((kw) => titleLc.includes(kw.toLowerCase()))) {
415 return false;
416 }
417 }
418
419 // Rule 2.5: title is location-neutral but body might still favour a
lib/graphs/pain-discovery-nodes.ts:L477isPureAnnouncement

Closest existing usage of signal classification logic that will be extended or replaced once a new failure class is named.

457 */
458function countKeywordOccurrences(text: string, keywords: string[]): number {
459 // Longest first so a longer keyword consumes the substring before a
460 // shorter prefix can re-match it (e.g. "south african" before
461 // "south africa").
462 const sorted = [...keywords].sort((a, b) => b.length - a.length);
463 let total = 0;
464 for (const raw of sorted) {
465 const kw = raw.toLowerCase();
466 // Word-boundary on either side to avoid spurious substring hits.
467 // Note: \b doesn't work well across spaces in a multi-word kw, so
468 // we escape and surround with lookarounds.
469 const escaped = kw.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
470 const re = new RegExp(`(?<![a-z0-9])${escaped}(?![a-z0-9])`, 'g');
471 const matches = text.match(re);
472 if (matches) total += matches.length;
473 }
474 return total;
475}
476
477function isPureAnnouncement(signal: PainSignal): boolean {
478 const text = `${signal.title} ${signal.text}`.toLowerCase();
479 const hasAnnouncement = ANNOUNCEMENT_WORDS_INLINE.some((w) =>
480 new RegExp(`\\b${w}\\b`, 'i').test(text),
481 );
482 if (!hasAnnouncement) return false;
483 const hasPain = PAIN_WORDS_INLINE.some((w) =>
484 new RegExp(`\\b${w}\\b`, 'i').test(text),
485 );
486 return !hasPain;
487}
488
489/**
490 * Walk the ranked pool (capped at top 5) and pick the first signal that
491 * isn't a pure announcement. If every top-5 candidate IS a pure
492 * announcement, fall back to the original top signal — we don't
493 * silently null. Downstream (extractPainFromUrl) will reject it if
494 * needed.
495 *
496 * The cap of 5 is deliberate: the ranker is already good at putting
497 * the best signal first. Walking further into the tail starts choosing
lib/spawnforge/retry-country-revalidation.ts:L13mentionsTargetCountry

Re-uses the same gate; any newly discovered edge case must be checked for consistency across both discovery and revalidation layers.

1/**
2 * Re-validate the WINNING extraction against the slot's target country
3 * after a multi-shot retry fallback. Prevents the
4 * primary-URL-failed-fallback-different-country bug class
5 * (2026-05-28: MA slot, fallback content South African — provenance
6 * fused into one Frankenstein catalog row).
7 *
8 * Called from orchestrator-cloud-complete.ts immediately after
9 * extractPainWithRetry returns. If attempts > 1 (fallback won) and
10 * the winning content is dominantly about a non-target country, the
11 * orchestrator must abort the slot rather than write a misflagged row.
12 *
13 * Re-uses mentionsTargetCountry from pain-discovery-nodes (the same
14 * rule set the gate runs in the discovery layer) so the contract stays
15 * consistent across both gating points.
16 */
17import { mentionsTargetCountry } from '@/lib/graphs/pain-discovery-nodes';
18import type { Region, PainSignal } from '@/lib/auto-catalog/types';
19
20export interface AssertionResult {
21 ok: boolean;
22 /** Filled in when ok=false — short reason for logs/dashboards. */
23 reason?: string;
24}
25
26export interface AssertionInput {
27 /** How many candidate URLs extractPainWithRetry tried. 1 = primary won. */
28 attempts: number;
29 /** The original primary URL from pain discovery. */
30 primaryUrl: string;
31 /** The URL that actually produced the successful extraction. */
32 winningUrl: string;
33 /** Slot's target country code. Can be null when the trigger has no region. */

Deliverable

Journal entry (or PR comment) naming one new edge-case class with three concrete failing signals and proposed test names

Quiz · 3 questions

1. When mentionsTargetCountry returns false for a signal whose title contains the target country only in a quoted source attribution, the correct next action is:

2. Name the failure class you would add to the eval surface after observing three signals that pass mentionsTargetCountry yet are rejected downstream by isPureAnnouncement.

3. Why does spending one full day simply naming the next bug class accelerate the overall Phase 1 timeline more than immediately writing a patch?

Journal