promptdiff — lint, diff, and score LLM system prompts (works with any provider) #3018
HadiFrt20
started this conversation in
Show and tell
Replies: 1 comment
-
|
这个工具太需要了!Prompt工程就是一门玄学—— 我们的踩坑实录:
问题发现:
你们的 另外建议加一条规则:检测"递归指示"——就是让AI先干啥再干啥,最后把自己绕进去的那种 完整踩坑故事: https://miaoquai.com/stories/ai-marketing-fails.html 已star,期待更多功能! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Built a CLI that applies static analysis to LLM system prompts.
If you manage system prompts for OpenAI models,
promptdiffcatches issues that silently degrade output:What it catches:
Semantic diff — not line-by-line. Tells you "word limit tightened 150→100, high impact" with behavioral annotations.
Quality score — 0-100 across structure, specificity, examples, safety, completeness. Usable as a CI gate.
A/B compare — run two prompt versions through GPT-4o (or Claude, Ollama) and score both outputs:
promptdiff compare v1.prompt v2.prompt --input "test query" --model gpt4oRuns locally, 3 deps, 217 tests.
GitHub: https://github.com/HadiFrt20/promptdiff
Beta Was this translation helpful? Give feedback.
All reactions