Chase next edge-case class in eval surface
This free ship day forces systematic exploration of the current eval harness to surface the next coherent failure class before Phase 2 begins. Identifying and naming that class now prevents diffuse bug-chasing later and directly informs the next gated improvement.
Resources
- 25 minreadingAndrej Karpathy blogA Recipe for Training Neural Networks
sections on debugging and error analysis
Codebase anchors
The Tribunal code that demonstrates today's concept. Click the line to open in GitHub or VS Code.
This is the existing pattern you will measure against while hunting for new edge-case classes that slip past country targeting.
379 * (AE). Without this rule the gate is symmetric to any country380 * named ANYWHERE in title+snippet, which is too permissive when381 * Serper returns global results with a passing mention of the382 * target.383 *384 * 3. Title is location-neutral (no country keywords at all) →385 * fall back to the original snippet check. This preserves the386 * legacy behavior for cases like "Solo founders burn out scaling387 * product" where the country only appears in the body.388 *389 * False-positive risk: a curated REGION_META keyword might overlap390 * with an unrelated place (e.g. "Lagos, Portugal" → matches NG keyword391 * "Lagos"). Acceptable: REGION_META is hand-curated for major392 * cities/capitals, overlaps are rare, and rule 1 always wins when the393 * target keyword IS in the title.394 *395 * False-negative risk: a title legitimately mentioning two countries396 * ("Kenya imports power from Ethiopia" with target=KE) → rule 1 fires397 * first on "Kenya", we never reach rule 2. Safe.398 */399export function mentionsTargetCountry(signal: PainSignal, region: Region): boolean {400 const meta = REGION_META[region];401 if (!meta || meta.keywords.length === 0) return true; // no metadata → don't gate402 const titleLc = signal.title.toLowerCase();403 const textLc = signal.text.toLowerCase();404 405 // Rule 1: target mentioned in title → strong accept406 const targetInTitle = meta.keywords.some((kw) => titleLc.includes(kw.toLowerCase()));407 if (targetInTitle) return true;408 409 // Rule 2: competing country in title (without target) → strong reject410 for (const [otherCode, otherMeta] of Object.entries(REGION_META) as Array<411 [Region, (typeof REGION_META)[Region]]412 >) {413 if (otherCode === region) continue;414 if (otherMeta.keywords.some((kw) => titleLc.includes(kw.toLowerCase()))) {415 return false;416 }417 }418 419 // Rule 2.5: title is location-neutral but body might still favour aClosest existing usage of signal classification logic that will be extended or replaced once a new failure class is named.
457 */458function countKeywordOccurrences(text: string, keywords: string[]): number {459 // Longest first so a longer keyword consumes the substring before a460 // shorter prefix can re-match it (e.g. "south african" before461 // "south africa").462 const sorted = [...keywords].sort((a, b) => b.length - a.length);463 let total = 0;464 for (const raw of sorted) {465 const kw = raw.toLowerCase();466 // Word-boundary on either side to avoid spurious substring hits.467 // Note: \b doesn't work well across spaces in a multi-word kw, so468 // we escape and surround with lookarounds.469 const escaped = kw.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');470 const re = new RegExp(`(?<![a-z0-9])${escaped}(?![a-z0-9])`, 'g');471 const matches = text.match(re);472 if (matches) total += matches.length;473 }474 return total;475}476 477function isPureAnnouncement(signal: PainSignal): boolean {478 const text = `${signal.title} ${signal.text}`.toLowerCase();479 const hasAnnouncement = ANNOUNCEMENT_WORDS_INLINE.some((w) =>480 new RegExp(`\\b${w}\\b`, 'i').test(text),481 );482 if (!hasAnnouncement) return false;483 const hasPain = PAIN_WORDS_INLINE.some((w) =>484 new RegExp(`\\b${w}\\b`, 'i').test(text),485 );486 return !hasPain;487}488 489/**490 * Walk the ranked pool (capped at top 5) and pick the first signal that491 * isn't a pure announcement. If every top-5 candidate IS a pure492 * announcement, fall back to the original top signal — we don't493 * silently null. Downstream (extractPainFromUrl) will reject it if494 * needed.495 *496 * The cap of 5 is deliberate: the ranker is already good at putting497 * the best signal first. Walking further into the tail starts choosingRe-uses the same gate; any newly discovered edge case must be checked for consistency across both discovery and revalidation layers.
1/**2 * Re-validate the WINNING extraction against the slot's target country3 * after a multi-shot retry fallback. Prevents the4 * primary-URL-failed-fallback-different-country bug class5 * (2026-05-28: MA slot, fallback content South African — provenance6 * fused into one Frankenstein catalog row).7 *8 * Called from orchestrator-cloud-complete.ts immediately after9 * extractPainWithRetry returns. If attempts > 1 (fallback won) and10 * the winning content is dominantly about a non-target country, the11 * orchestrator must abort the slot rather than write a misflagged row.12 *13 * Re-uses mentionsTargetCountry from pain-discovery-nodes (the same14 * rule set the gate runs in the discovery layer) so the contract stays15 * consistent across both gating points.16 */17import { mentionsTargetCountry } from '@/lib/graphs/pain-discovery-nodes';18import type { Region, PainSignal } from '@/lib/auto-catalog/types';19 20export interface AssertionResult {21 ok: boolean;22 /** Filled in when ok=false — short reason for logs/dashboards. */23 reason?: string;24}25 26export interface AssertionInput {27 /** How many candidate URLs extractPainWithRetry tried. 1 = primary won. */28 attempts: number;29 /** The original primary URL from pain discovery. */30 primaryUrl: string;31 /** The URL that actually produced the successful extraction. */32 winningUrl: string;33 /** Slot's target country code. Can be null when the trigger has no region. */Deliverable
Journal entry (or PR comment) naming one new edge-case class with three concrete failing signals and proposed test names
Quiz · 3 questions
1. When mentionsTargetCountry returns false for a signal whose title contains the target country only in a quoted source attribution, the correct next action is:
2. Name the failure class you would add to the eval surface after observing three signals that pass mentionsTargetCountry yet are rejected downstream by isPureAnnouncement.
3. Why does spending one full day simply naming the next bug class accelerate the overall Phase 1 timeline more than immediately writing a patch?