Skip to content

Test-Driven Prompt Development

Most prompt changes are evaluated manually: someone runs the prompt, reads the output, and decides if it looks good. This is:

  • Unreproducible — different testers, different judgments
  • Slow — manual review bottlenecks iteration
  • Lossy — no record of what was tested and why it passed

A prompt-test minion defines:

  • Input variables — the values to inject into the prompt
  • Expected criteria — natural language description of what good output looks like
  • Scoring dimensions — the axes along which to evaluate (e.g., relevance, clarity, accuracy)
import { createMinion } from 'minions-sdk';
import { promptTestType } from 'minions-prompts';
const { minion: test } = createMinion(
{
title: 'Tech audience test',
fields: {
inputVariables: {
topic: 'transformer architecture',
audience: 'software engineers',
},
expectedCriteria: 'Should explain attention mechanism clearly without oversimplifying.',
scoringDimensions: ['relevance', 'clarity', 'accuracy'],
},
},
promptTestType,
);
await storage.saveMinion(test);
import { PromptScorer } from 'minions-prompts';
const scorer = new PromptScorer(storage);
const result = await scorer.runTest(promptId, test.id, {
scores: { relevance: 88, clarity: 82, accuracy: 90 },
passed: true,
});
console.log('Passed:', result.passed);
console.log('Scores:', result.scores);