Test-Driven Prompt Development
The Problem
Section titled “The Problem”Most prompt changes are evaluated manually: someone runs the prompt, reads the output, and decides if it looks good. This is:
- Unreproducible — different testers, different judgments
- Slow — manual review bottlenecks iteration
- Lossy — no record of what was tested and why it passed
The Solution: Prompt Test Cases
Section titled “The Solution: Prompt Test Cases”A prompt-test minion defines:
- Input variables — the values to inject into the prompt
- Expected criteria — natural language description of what good output looks like
- Scoring dimensions — the axes along which to evaluate (e.g., relevance, clarity, accuracy)
Creating a Test
Section titled “Creating a Test”import { createMinion } from 'minions-sdk';import { promptTestType } from 'minions-prompts';
const { minion: test } = createMinion( { title: 'Tech audience test', fields: { inputVariables: { topic: 'transformer architecture', audience: 'software engineers', }, expectedCriteria: 'Should explain attention mechanism clearly without oversimplifying.', scoringDimensions: ['relevance', 'clarity', 'accuracy'], }, }, promptTestType,);await storage.saveMinion(test);from minions import create_minionfrom minions_prompts import prompt_test_type
test, _ = create_minion( { "title": "Tech audience test", "fields": { "inputVariables": { "topic": "transformer architecture", "audience": "software engineers", }, "expectedCriteria": "Should explain attention mechanism clearly without oversimplifying.", "scoringDimensions": ["relevance", "clarity", "accuracy"], }, }, prompt_test_type,)storage.save_minion(test)Running a Test
Section titled “Running a Test”import { PromptScorer } from 'minions-prompts';
const scorer = new PromptScorer(storage);
const result = await scorer.runTest(promptId, test.id, { scores: { relevance: 88, clarity: 82, accuracy: 90 }, passed: true,});
console.log('Passed:', result.passed);console.log('Scores:', result.scores);from minions_prompts import PromptScorer
scorer = PromptScorer(storage)
result = scorer.run_test( prompt_id, test.id, scores={"relevance": 88, "clarity": 82, "accuracy": 90}, passed=True,)
print("Passed:", result.passed)print("Scores:", result.scores)