Skip to content

Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758

Open
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4589-1
Open

Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4589-1

Conversation

@moonbox3
Copy link
Contributor

Motivation and Context

When a user approved a pending tool call and the agent resumed, the executed tool's result was never emitted as a TOOL_CALL_RESULT event in the AG-UI stream. This meant downstream consumers (e.g., CopilotKit) had no visibility into the tool execution outcome, breaking the approval-resume flow.

Fixes #4589

Description

The root cause was that _resolve_approval_responses executed the approved tools and inserted results into the message history, but run_agent_stream never yielded corresponding ToolCallResultEvents to the event stream. The fix makes _resolve_approval_responses return the list of resolved Content objects, and run_agent_stream iterates over them right after RUN_STARTED to emit a ToolCallResultEvent for each resolved approval. Three new tests verify that the event is emitted with the correct tool_call_id and content, and that no spurious events appear when no approval is present.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

Copilot and others added 2 commits March 18, 2026 08:26
When a tool call is approved via the interrupt/resume flow,
_resolve_approval_responses executes the tool and injects the result
into the messages array, but no TOOL_CALL_RESULT SSE event was yielded
to the client.

Changes:
- _resolve_approval_responses now returns the list of resolved
  function_result Content objects instead of None
- run_agent_stream yields ToolCallResultEvent for each resolved
  approval result after RunStartedEvent is emitted
- Add ToolCallResultEvent to ag_ui.core imports in _agent_run.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 18, 2026 08:30
@moonbox3 moonbox3 self-assigned this Mar 18, 2026
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by moonbox3's agents

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _agent_run.py4845089%149–156, 195–196, 203, 300, 312, 316, 318–319, 335, 362–363, 399–403, 517–519, 531–533, 631, 639, 748, 750–751, 804, 821–822, 829, 897, 920, 928, 930, 933, 939, 992, 995, 1005–1006, 1013
TOTAL24013263889% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5260 20 💤 0 ❌ 0 🔥 1m 24s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes AG-UI streaming behavior for HITL approval-resume flows by ensuring tool execution results are emitted as TOOL_CALL_RESULT events when the agent resumes after a user approves a pending tool call (fixes #4589).

Changes:

  • Update _resolve_approval_responses to return resolved function_result Content items.
  • Emit ToolCallResultEvent entries in run_agent_stream immediately after RUN_STARTED for any tools executed during approval resolution.
  • Add new unit tests covering TOOL_CALL_RESULT emission on approval resume and absence when no approval is present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py Returns resolved approval results and emits TOOL_CALL_RESULT events on resume.
python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py Adds tests for TOOL_CALL_RESULT emission in approval-resume flows.

You can also share your feedback on Copilot code review. Take the survey.

1. _resolve_approval_responses now returns only approved results (not
   rejections) so TOOL_CALL_RESULT events are emitted only for executed
   tools. Rejection results are still written into message history.

2. Emit resolved TOOL_CALL_RESULT events in the no-updates fallback
   RUN_STARTED path so approval results are never lost.

3. Rewrite tests to use real FunctionTool with func and
   approval_mode='always_require' via StubAgent default_options,
   verifying actual tool execution output in TOOL_CALL_RESULT content.
   Added test for rejection not emitting TOOL_CALL_RESULT.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The change correctly separates approved and rejected tool call results so that only approved results are returned from _resolve_approval_responses and emitted as TOOL_CALL_RESULT events, while both approved and rejected results remain in message history. The _replace_approval_contents_with_results function receives all_results (approved + rejected) but only consumes approved entries via result_idx — the rejection entries are unused since that function already creates inline rejection results. The TOOL_CALL_RESULT emission is correctly placed in both the 'stream has updates' and 'no updates' paths. Tests are strengthened by using real executable FunctionTool objects with concrete result assertions, and a new rejection test verifies no events are emitted for rejected calls.

✓ Security Reliability

This change splits the return value of _resolve_approval_responses so that only approved tool-call results are emitted as TOOL_CALL_RESULT events while rejected results are still written into message history. The security-sensitive validation logic (pending_approvals registry, replay prevention, function-name spoofing checks) is unchanged. The two emission sites for ToolCallResultEvent (lines 867-877 inside the first-update branch and lines 933-943 inside the no-updates fallback) are mutually exclusive via the run_started_emitted flag. The _replace_approval_contents_with_results helper independently generates rejection results via its own else branch (line 1790-1797 in _tools.py), so the rejection items appended to all_results are never consumed from the list—this is harmless but slightly wasteful. Tests are updated to use a real FunctionTool with approval_mode='always_require' and now assert exact tool output, plus a new test validates rejections produce no events. No injection, deserialization, resource-leak, or unhandled-failure issues found.

✗ Test Coverage

The PR correctly separates approved from rejected results in _resolve_approval_responses and adds a new test_rejection_does_not_emit_tool_call_result test. Existing tests are upgraded with real tool execution and exact content assertions, which is a quality improvement. However, there is no test covering the mixed approved+rejected scenario (some calls approved, some rejected in the same stream), and the duplicated TOOL_CALL_RESULT emission code in the no-updates fallback path (lines 933-943) has zero test coverage. Both gaps exercise distinct code paths introduced or modified by this PR.

✓ Design Approach

The design approach is sound: separating approved results (returned for event emission) from rejection results (written to message history only) correctly models the AG-UI protocol requirement that TOOL_CALL_RESULT events should only fire for actually-executed tool calls. The split in _resolve_approval_responses between approved_results / rejection_results / all_results is clean and the docstring accurately reflects the contract. One structural concern is that the TOOL_CALL_RESULT emission block is copy-pasted verbatim into two separate code paths (lines 867-877 for the normal stream case, lines 933-943 for the zero-updates fallback). This duplication means any future change to serialization logic or the ToolCallResultEvent shape must be applied in both places, which is a maintenance hazard. Extracting a small helper (e.g., _emit_approval_tool_results) would eliminate the duplication without any behavioral change.

Flagged Issues

  • No test for a mixed approve/reject scenario (e.g., two tool calls where one is approved and one is rejected). This is the main behavioral change—_resolve_approval_responses now splits results into two lists—and it is only tested with all-approved or all-rejected inputs, never a mix.

Suggestions

  • The rejection_results list built in _resolve_approval_responses (lines 519-526 of _agent_run.py) is appended to all_results and passed to _replace_approval_contents_with_results, but that function never consumes those items—its else branch at _tools.py:1790 creates its own inline rejection results. Consider passing only approved_results to _replace_approval_contents_with_results (or removing the rejection_results construction entirely) to eliminate dead data and clarify the contract.
  • The TOOL_CALL_RESULT emission block is duplicated verbatim in the first-update path (lines 867-877) and the no-updates fallback path (lines 933-943). Extract it into a small async generator helper (e.g., _emit_approval_tool_results(resolved_approval_results)) to keep serialization logic in one place and prevent the two copies from drifting apart.
  • All existing tests use a StubAgent that always yields updates, so only the first TOOL_CALL_RESULT emission site (lines 867-877) is exercised. Add a test where the agent produces zero updates to cover the no-updates fallback path (lines 933-943).
  • Consider adding a direct unit test for _resolve_approval_responses that verifies the return value contains only approved results while rejection results are still written into messages. The current tests verify this indirectly via event emission, which makes it harder to pinpoint regressions in the splitting logic.

Automated review by moonbox3's agents

Copilot and others added 3 commits March 18, 2026 08:50
- Extract duplicated TOOL_CALL_RESULT emission block into
  _make_approval_tool_result_events helper to prevent drift
- Remove dead rejection_results construction in _resolve_approval_responses;
  _replace_approval_contents_with_results already handles rejections inline
- Pass only approved_results (not all_results) to clarify the contract
- Add mixed approve/reject test validating the core splitting logic
- Add zero-updates test covering the no-updates fallback emission path
- Add direct unit test for _resolve_approval_responses return value

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add blank line between first-party and third-party import groups
to satisfy ruff I001 rule.

Fixes microsoft#4589

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by moonbox3's agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: TOOL_CALL_RESULT Events Not Emitted on Approval Resume

4 participants