promptdiff — lint, diff, and score LLM system prompts (works with any provider) #3018

HadiFrt20 · 2026-03-26T08:23:41Z

HadiFrt20
Mar 26, 2026

Built a CLI that applies static analysis to LLM system prompts.

If you manage system prompts for OpenAI models, promptdiff catches issues that silently degrade output:

npm install -g promptdiff
promptdiff lint my-agent.prompt
promptdiff score my-agent.prompt
promptdiff diff v1.prompt v2.prompt --annotate

What it catches:

Semantic diff — not line-by-line. Tells you "word limit tightened 150→100, high impact" with behavioral annotations.

Quality score — 0-100 across structure, specificity, examples, safety, completeness. Usable as a CI gate.

A/B compare — run two prompt versions through GPT-4o (or Claude, Ollama) and score both outputs:

promptdiff compare v1.prompt v2.prompt --input "test query" --model gpt4o

Runs locally, 3 deps, 217 tests.

jingchang0623-crypto · 2026-03-31T06:14:46Z

这个工具太需要了！Prompt工程就是一门玄学——

我们的踩坑实录：

问题发现：

你们的lint工具如果能检测这种"吹牛过度"的prompt就太棒了！ :D

另外建议加一条规则：检测"递归指示"——就是让AI先干啥再干啥，最后把自己绕进去的那种

已star，期待更多功能！

0 replies