Writing Effective Prompt Tests
Rule 1: Write the test before changing the prompt
Section titled “Rule 1: Write the test before changing the prompt”Like TDD for code, define what “passing” means before you make changes. This prevents you from unconsciously writing tests that match whatever your new version happens to produce.
Rule 2: Use realistic input variables
Section titled “Rule 2: Use realistic input variables”Test with inputs that represent your actual production traffic:
{ "topic": "the effects of monetary policy on inflation", "audience": "undergraduate economics students", "length": "3"}Avoid trivial or toy inputs — they don’t catch edge cases.
Rule 3: Score multiple dimensions
Section titled “Rule 3: Score multiple dimensions”A single “quality” score hides regressions. Use separate dimensions:
- relevance — does the output address the prompt?
- clarity — is the output easy to understand?
- accuracy — are the facts correct?
- tone — does the style match expectations?
Rule 4: Write crisp expectedCriteria
Section titled “Rule 4: Write crisp expectedCriteria”The expectedCriteria field is the source of truth for human (or LLM judge) evaluation:
“The summary should be 2-3 sentences, use plain language, and focus on practical impact rather than technical details.”
Vague criteria (“should be good”) produce inconsistent scores across evaluators.
Rule 5: Create test suites, not one-off tests
Section titled “Rule 5: Create test suites, not one-off tests”Use runTestSuite() to batch-evaluate a prompt across multiple diverse inputs. Track average scores by dimension over time.