Feat/agent test run eval @W-21482725@#350
Merged
WillieRuemmele merged 12 commits intomainfrom Mar 6, 2026
Merged
Conversation
Add a new command that runs evaluation tests against Agentforce agents using the Einstein Eval Labs API. Complements `sf agent test run` by supporting direct JSON payloads with no org metadata deployment step. Features: - 8+ evaluator types (string/JSON assertions, text alignment, etc.) - Smart payload normalization (field correction, shorthand refs, defaults) - Agent ID resolution from DeveloperName via --agent-api-name - Batch execution (max 5 tests per API request) - CI/CD output formats (human, JSON, JUnit XML, TAP) - Exit code 1 on failures
Accept YAML test specs (same format as `sf agent generate test-spec`) and translate them to Einstein Eval Labs API calls. The same YAML spec that works with `sf agent test run` now also works with `run-eval`, gaining access to richer evaluators (topic/action assertions, bot_response_rating, string/numeric assertions, text alignment). - New `--spec` flag replaces `--payload` (accepts YAML or JSON) - Auto-detects format by content (testCases+subjectName = YAML) - Auto-infers `--agent-api-name` from YAML spec's `subjectName` - Smart `get_state` optimization (only when evaluators need it) - Translates `$.generatedData.*` JSONPaths to Eval API refs - Maps customEvaluations (string_comparison, numeric_comparison) - 39 new unit tests for the translator (197 total passing)
The scenario agent and MCP tools generate test payloads using MCP
shorthand format (`type: "evaluator"` + `evaluator_type`) instead of
the raw Eval API format (`type: "evaluator.planner_topic_assertion"`).
Add `normalizeMcpShorthand` as the first normalization pass:
- Merges `type: "evaluator"` + `evaluator_type: "xxx"` → `type: "evaluator.xxx"`
- Converts `field: "gs1.planner_state.topic"` → `actual: "{gs1.response...}"`
- Maps MCP field paths to Eval API JSONPaths
- Auto-generates missing `id` fields on evaluator steps
- 11 new unit tests (208 total passing)
|
Thanks for the contribution! Unfortunately we can't verify the commit author(s): Tanner McGrath <t***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
shetzel
approved these changes
Mar 6, 2026
Contributor
shetzel
left a comment
There was a problem hiding this comment.
We'll definitely refactor some things before this can go GA, but very nice beta addition!
|
|
||
| // Set exit code to 1 if any tests failed | ||
| if (summary.failed > 0 || summary.errors > 0) { | ||
| process.exitCode = 1; |
Contributor
There was a problem hiding this comment.
We'll want to match apex test run and the other agent test command for exit codes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Brings along Tanner's fork PR and new command
adds NUTs
upgrades to use standard libraries
uses OCLIF stdin
What issues does this PR fix or reference?
@W-21482725@