Python: Emit TOOL_CALL_RESULT events when resuming after tool approval by moonbox3 · Pull Request #4758 · microsoft/agent-framework

moonbox3 · 2026-03-18T08:30:32Z

Motivation and Context

When a user approved a pending tool call and the agent resumed, the executed tool's result was never emitted as a TOOL_CALL_RESULT event in the AG-UI stream. This meant downstream consumers (e.g., CopilotKit) had no visibility into the tool execution outcome, breaking the approval-resume flow.

Fixes #4589

Description

The root cause was that _resolve_approval_responses executed the approved tools and inserted results into the message history, but run_agent_stream never yielded corresponding ToolCallResultEvents to the event stream. The fix makes _resolve_approval_responses return the list of resolved Content objects, and run_agent_stream iterates over them right after RUN_STARTED to emit a ToolCallResultEvent for each resolved approval. Three new tests verify that the event is emitted with the correct tool_call_id and content, and that no spurious events appear when no approval is present.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

When a tool call is approved via the interrupt/resume flow, _resolve_approval_responses executes the tool and injects the result into the messages array, but no TOOL_CALL_RESULT SSE event was yielded to the client. Changes: - _resolve_approval_responses now returns the list of resolved function_result Content objects instead of None - run_agent_stream yields ToolCallResultEvent for each resolved approval result after RunStartedEvent is emitted - Add ToolCallResultEvent to ag_ui.core imports in _agent_run.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

markwallace-microsoft · 2026-03-18T08:33:03Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_agent_run.py	484	50	89%	149–156, 195–196, 203, 300, 312, 316, 318–319, 335, 362–363, 399–403, 517–519, 531–533, 631, 639, 748, 750–751, 804, 821–822, 829, 897, 920, 928, 930, 933, 939, 992, 995, 1005–1006, 1013
TOTAL	24013	2638	89%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5260	20 💤	0 ❌	0 🔥	1m 24s ⏱️

Copilot

Pull request overview

This PR fixes AG-UI streaming behavior for HITL approval-resume flows by ensuring tool execution results are emitted as TOOL_CALL_RESULT events when the agent resumes after a user approves a pending tool call (fixes #4589).

Changes:

Update _resolve_approval_responses to return resolved function_result Content items.
Emit ToolCallResultEvent entries in run_agent_stream immediately after RUN_STARTED for any tools executed during approval resolution.
Add new unit tests covering TOOL_CALL_RESULT emission on approval resume and absence when no approval is present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py`	Returns resolved approval results and emits `TOOL_CALL_RESULT` events on resume.
`python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py`	Adds tests for TOOL_CALL_RESULT emission in approval-resume flows.

You can also share your feedback on Copilot code review. Take the survey.

python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py

python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py

1. _resolve_approval_responses now returns only approved results (not rejections) so TOOL_CALL_RESULT events are emitted only for executed tools. Rejection results are still written into message history. 2. Emit resolved TOOL_CALL_RESULT events in the no-updates fallback RUN_STARTED path so approval results are never lost. 3. Rewrite tests to use real FunctionTool with func and approval_mode='always_require' via StubAgent default_options, verifying actual tool execution output in TOOL_CALL_RESULT content. Added test for rejection not emitting TOOL_CALL_RESULT. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The change correctly separates approved and rejected tool call results so that only approved results are returned from _resolve_approval_responses and emitted as TOOL_CALL_RESULT events, while both approved and rejected results remain in message history. The _replace_approval_contents_with_results function receives all_results (approved + rejected) but only consumes approved entries via result_idx — the rejection entries are unused since that function already creates inline rejection results. The TOOL_CALL_RESULT emission is correctly placed in both the 'stream has updates' and 'no updates' paths. Tests are strengthened by using real executable FunctionTool objects with concrete result assertions, and a new rejection test verifies no events are emitted for rejected calls.

✓ Security Reliability

This change splits the return value of _resolve_approval_responses so that only approved tool-call results are emitted as TOOL_CALL_RESULT events while rejected results are still written into message history. The security-sensitive validation logic (pending_approvals registry, replay prevention, function-name spoofing checks) is unchanged. The two emission sites for ToolCallResultEvent (lines 867-877 inside the first-update branch and lines 933-943 inside the no-updates fallback) are mutually exclusive via the run_started_emitted flag. The _replace_approval_contents_with_results helper independently generates rejection results via its own else branch (line 1790-1797 in _tools.py), so the rejection items appended to all_results are never consumed from the list—this is harmless but slightly wasteful. Tests are updated to use a real FunctionTool with approval_mode='always_require' and now assert exact tool output, plus a new test validates rejections produce no events. No injection, deserialization, resource-leak, or unhandled-failure issues found.

✗ Test Coverage

The PR correctly separates approved from rejected results in _resolve_approval_responses and adds a new test_rejection_does_not_emit_tool_call_result test. Existing tests are upgraded with real tool execution and exact content assertions, which is a quality improvement. However, there is no test covering the mixed approved+rejected scenario (some calls approved, some rejected in the same stream), and the duplicated TOOL_CALL_RESULT emission code in the no-updates fallback path (lines 933-943) has zero test coverage. Both gaps exercise distinct code paths introduced or modified by this PR.

✓ Design Approach

The design approach is sound: separating approved results (returned for event emission) from rejection results (written to message history only) correctly models the AG-UI protocol requirement that TOOL_CALL_RESULT events should only fire for actually-executed tool calls. The split in _resolve_approval_responses between approved_results / rejection_results / all_results is clean and the docstring accurately reflects the contract. One structural concern is that the TOOL_CALL_RESULT emission block is copy-pasted verbatim into two separate code paths (lines 867-877 for the normal stream case, lines 933-943 for the zero-updates fallback). This duplication means any future change to serialization logic or the ToolCallResultEvent shape must be applied in both places, which is a maintenance hazard. Extracting a small helper (e.g., _emit_approval_tool_results) would eliminate the duplication without any behavioral change.

Flagged Issues

No test for a mixed approve/reject scenario (e.g., two tool calls where one is approved and one is rejected). This is the main behavioral change—_resolve_approval_responses now splits results into two lists—and it is only tested with all-approved or all-rejected inputs, never a mix.

Suggestions

The rejection_results list built in _resolve_approval_responses (lines 519-526 of _agent_run.py) is appended to all_results and passed to _replace_approval_contents_with_results, but that function never consumes those items—its else branch at _tools.py:1790 creates its own inline rejection results. Consider passing only approved_results to _replace_approval_contents_with_results (or removing the rejection_results construction entirely) to eliminate dead data and clarify the contract.
The TOOL_CALL_RESULT emission block is duplicated verbatim in the first-update path (lines 867-877) and the no-updates fallback path (lines 933-943). Extract it into a small async generator helper (e.g., _emit_approval_tool_results(resolved_approval_results)) to keep serialization logic in one place and prevent the two copies from drifting apart.
All existing tests use a StubAgent that always yields updates, so only the first TOOL_CALL_RESULT emission site (lines 867-877) is exercised. Add a test where the agent produces zero updates to cover the no-updates fallback path (lines 933-943).
Consider adding a direct unit test for _resolve_approval_responses that verifies the return value contains only approved results while rejection results are still written into messages. The current tests verify this indirectly via event emission, which makes it harder to pinpoint regressions in the splitting logic.

Automated review by moonbox3's agents

python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py

python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py

- Extract duplicated TOOL_CALL_RESULT emission block into _make_approval_tool_result_events helper to prevent drift - Remove dead rejection_results construction in _resolve_approval_responses; _replace_approval_contents_with_results already handles rejections inline - Pass only approved_results (not all_results) to clarify the contract - Add mixed approve/reject test validating the core splitting logic - Add zero-updates test covering the no-updates fallback emission path - Add direct unit test for _resolve_approval_responses return value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add blank line between first-party and third-party import groups to satisfy ruff I001 rule. Fixes microsoft#4589 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

Copilot and others added 2 commits March 18, 2026 08:26

Apply pre-commit auto-fixes

d727c12

Copilot AI review requested due to automatic review settings March 18, 2026 08:30

moonbox3 self-assigned this Mar 18, 2026

markwallace-microsoft added the python label Mar 18, 2026

moonbox3 commented Mar 18, 2026

View reviewed changes

Copilot started reviewing on behalf of moonbox3 March 18, 2026 08:31 View session

eavanvalkenburg approved these changes Mar 18, 2026

View reviewed changes

Copilot AI reviewed Mar 18, 2026

View reviewed changes

python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py Show resolved Hide resolved

python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py Outdated Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py Show resolved Hide resolved

moonbox3 commented Mar 18, 2026

View reviewed changes

python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py Show resolved Hide resolved

python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py Outdated Show resolved Hide resolved

Copilot and others added 3 commits March 18, 2026 08:50

Apply pre-commit auto-fixes

b3605ef

Fix import sorting lint error in test_approval_result_event.py

f92ae1d

Add blank line between first-party and third-party import groups to satisfy ruff I001 rule. Fixes microsoft#4589 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3 commented Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758

Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4589-1

moonbox3 commented Mar 18, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

markwallace-microsoft commented Mar 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

moonbox3 commented Mar 18, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

markwallace-microsoft commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✗ Test Coverage

✓ Design Approach

Flagged Issues

Suggestions

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

markwallace-microsoft commented Mar 18, 2026 •

edited

Loading