Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 771 Bytes

File metadata and controls

17 lines (14 loc) · 771 Bytes

Examples

Real model output, verbatim from benchmark runs, the same task answered by the same model with no skill (## Without Ponytail) and with ponytail (## With Ponytail), so you can compare side by side. Model: Claude Haiku 4.5, temperature 1, source benchmarks/output.json.

These are not hand-written. Reproduce them yourself: npx promptfoo@latest eval -c benchmarks/promptfooconfig.yaml. Method, all three models, and median-of-10 numbers: ../benchmarks/.

Example Without (LOC) With (LOC)
Email Validation 75 3
Debounce 116 10
CSV Sum 20 3
Countdown Timer 267 9
Rate Limiting 128 10