A/B Testing Prompts
Why A/B Test Prompts?
Section titled “Why A/B Test Prompts?”A/B testing lets you make evidence-based decisions about prompt changes. Instead of “version 2 feels better,” you get concrete score deltas across multiple test cases.
You need:
- Two prompt versions (
v1andv2) - A set of test cases with input variables
- Evaluations (scores) for both versions
import { PromptScorer } from 'minions-prompts';
const scorer = new PromptScorer(storage);
const comparisons = await scorer.compareVersions( v1.id, v2.id, [test1.id, test2.id, test3.id], // v1 evaluations [ { scores: { relevance: 78, clarity: 80 }, passed: true }, { scores: { relevance: 72, clarity: 75 }, passed: true }, { scores: { relevance: 85, clarity: 88 }, passed: true }, ], // v2 evaluations [ { scores: { relevance: 85, clarity: 87 }, passed: true }, { scores: { relevance: 80, clarity: 82 }, passed: true }, { scores: { relevance: 88, clarity: 91 }, passed: true }, ],);
for (const cmp of comparisons) { console.log(`Test ${cmp.testId}: winner = ${cmp.winner}`); console.log('Deltas:', cmp.deltas);}from minions_prompts import PromptScorer
scorer = PromptScorer(storage)
comparisons = scorer.compare_versions( v1.id, v2.id, [test1.id, test2.id, test3.id], # v1 evaluations [ {"scores": {"relevance": 78, "clarity": 80}, "passed": True}, {"scores": {"relevance": 72, "clarity": 75}, "passed": True}, {"scores": {"relevance": 85, "clarity": 88}, "passed": True}, ], # v2 evaluations [ {"scores": {"relevance": 85, "clarity": 87}, "passed": True}, {"scores": {"relevance": 80, "clarity": 82}, "passed": True}, {"scores": {"relevance": 88, "clarity": 91}, "passed": True}, ],)
for cmp in comparisons: print(f"Test {cmp.test_id}: winner = {cmp.winner}") print("Deltas:", cmp.deltas)Interpreting Results
Section titled “Interpreting Results”The deltas field shows the score difference per dimension (positive = v2 is better):
{ "relevance": 7, "clarity": 7}The winner field is "v1", "v2", or "tie" based on total delta.