Feat/agent test run eval @W-21482725@ by WillieRuemmele · Pull Request #350 · salesforcecli/plugin-agent

WillieRuemmele · 2026-03-06T18:22:36Z

What does this PR do?

Brings along Tanner's fork PR and new command
adds NUTs
upgrades to use standard libraries
uses OCLIF stdin

What issues does this PR fix or reference?

@W-21482725@

Add a new command that runs evaluation tests against Agentforce agents using the Einstein Eval Labs API. Complements `sf agent test run` by supporting direct JSON payloads with no org metadata deployment step. Features: - 8+ evaluator types (string/JSON assertions, text alignment, etc.) - Smart payload normalization (field correction, shorthand refs, defaults) - Agent ID resolution from DeveloperName via --agent-api-name - Batch execution (max 5 tests per API request) - CI/CD output formats (human, JSON, JUnit XML, TAP) - Exit code 1 on failures

Accept YAML test specs (same format as `sf agent generate test-spec`) and translate them to Einstein Eval Labs API calls. The same YAML spec that works with `sf agent test run` now also works with `run-eval`, gaining access to richer evaluators (topic/action assertions, bot_response_rating, string/numeric assertions, text alignment). - New `--spec` flag replaces `--payload` (accepts YAML or JSON) - Auto-detects format by content (testCases+subjectName = YAML) - Auto-infers `--agent-api-name` from YAML spec's `subjectName` - Smart `get_state` optimization (only when evaluators need it) - Translates `$.generatedData.*` JSONPaths to Eval API refs - Maps customEvaluations (string_comparison, numeric_comparison) - 39 new unit tests for the translator (197 total passing)

The scenario agent and MCP tools generate test payloads using MCP shorthand format (`type: "evaluator"` + `evaluator_type`) instead of the raw Eval API format (`type: "evaluator.planner_topic_assertion"`). Add `normalizeMcpShorthand` as the first normalization pass: - Merges `type: "evaluator"` + `evaluator_type: "xxx"` → `type: "evaluator.xxx"` - Converts `field: "gs1.planner_state.topic"` → `actual: "{gs1.response...}"` - Maps MCP field paths to Eval API JSONPaths - Auto-generates missing `id` fields on evaluator steps - 11 new unit tests (208 total passing)

salesforce-cla · 2026-03-06T18:22:43Z

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Tanner McGrath <t***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

messages/agent.test.run-eval.md

shetzel

We'll definitely refactor some things before this can go GA, but very nice beta addition!

shetzel · 2026-03-06T22:24:31Z

src/commands/agent/test/run-eval.ts

+
+    // Set exit code to 1 if any tests failed
+    if (summary.failed > 0 || summary.errors > 0) {
+      process.exitCode = 1;


We'll want to match apex test run and the other agent test command for exit codes.

Tanner McGrath and others added 6 commits March 3, 2026 17:22

fix: initial updates, use Org/Connection, make beta/hidden

2a0f283

chore: update to use OCLIF stdin

531e427

test: add NUTs

1f9577e

WillieRuemmele requested a review from a team as a code owner March 6, 2026 18:22

salesforce-cla bot added the cla:missing label Mar 6, 2026

jshackell-sfdc reviewed Mar 6, 2026

View reviewed changes

WillieRuemmele added 3 commits March 6, 2026 11:57

docs: bring Juliet's suggestions, change to --api-name

2d173b2

test: claude try fixing NUTs

9d9f8e3

test: remove stdin NUTs - hard in CI

29db3b4

WillieRuemmele changed the title ~~Feat/agent test run eval~~ Feat/agent test run eval @W-21482725@ Mar 6, 2026

WillieRuemmele added 3 commits March 6, 2026 14:44

chore: gpt review and gpt fix

8abe669

chore: remove invalid field from soql

e6414e8

chore: fix parallel process, calculating headers

fc1c1db

shetzel approved these changes Mar 6, 2026

View reviewed changes

WillieRuemmele merged commit 0eb4a6d into main Mar 6, 2026
14 of 15 checks passed

WillieRuemmele deleted the feat/agent-test-run-eval branch March 6, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/agent test run eval @W-21482725@#350

Feat/agent test run eval @W-21482725@#350
WillieRuemmele merged 12 commits intomainfrom
feat/agent-test-run-eval

WillieRuemmele commented Mar 6, 2026 •

edited

Loading

Uh oh!

salesforce-cla bot commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shetzel left a comment

Uh oh!

shetzel Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WillieRuemmele commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What issues does this PR fix or reference?

Uh oh!

salesforce-cla bot commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shetzel left a comment

Choose a reason for hiding this comment

Uh oh!

shetzel Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WillieRuemmele commented Mar 6, 2026 •

edited

Loading