Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758
Python: Emit TOOL_CALL_RESULT events when resuming after tool approval#4758moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
Conversation
When a tool call is approved via the interrupt/resume flow, _resolve_approval_responses executes the tool and injects the result into the messages array, but no TOOL_CALL_RESULT SSE event was yielded to the client. Changes: - _resolve_approval_responses now returns the list of resolved function_result Content objects instead of None - run_agent_stream yields ToolCallResultEvent for each resolved approval result after RunStartedEvent is emitted - Add ToolCallResultEvent to ag_ui.core imports in _agent_run.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 97% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by moonbox3's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR fixes AG-UI streaming behavior for HITL approval-resume flows by ensuring tool execution results are emitted as TOOL_CALL_RESULT events when the agent resumes after a user approves a pending tool call (fixes #4589).
Changes:
- Update
_resolve_approval_responsesto return resolvedfunction_resultContentitems. - Emit
ToolCallResultEvententries inrun_agent_streamimmediately afterRUN_STARTEDfor any tools executed during approval resolution. - Add new unit tests covering TOOL_CALL_RESULT emission on approval resume and absence when no approval is present.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/packages/ag-ui/agent_framework_ag_ui/_agent_run.py |
Returns resolved approval results and emits TOOL_CALL_RESULT events on resume. |
python/packages/ag-ui/tests/ag_ui/test_approval_result_event.py |
Adds tests for TOOL_CALL_RESULT emission in approval-resume flows. |
You can also share your feedback on Copilot code review. Take the survey.
1. _resolve_approval_responses now returns only approved results (not rejections) so TOOL_CALL_RESULT events are emitted only for executed tools. Rejection results are still written into message history. 2. Emit resolved TOOL_CALL_RESULT events in the no-updates fallback RUN_STARTED path so approval results are never lost. 3. Rewrite tests to use real FunctionTool with func and approval_mode='always_require' via StubAgent default_options, verifying actual tool execution output in TOOL_CALL_RESULT content. Added test for rejection not emitting TOOL_CALL_RESULT. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
The change correctly separates approved and rejected tool call results so that only approved results are returned from
_resolve_approval_responsesand emitted asTOOL_CALL_RESULTevents, while both approved and rejected results remain in message history. The_replace_approval_contents_with_resultsfunction receivesall_results(approved + rejected) but only consumes approved entries viaresult_idx— the rejection entries are unused since that function already creates inline rejection results. The TOOL_CALL_RESULT emission is correctly placed in both the 'stream has updates' and 'no updates' paths. Tests are strengthened by using real executableFunctionToolobjects with concrete result assertions, and a new rejection test verifies no events are emitted for rejected calls.
✓ Security Reliability
This change splits the return value of
_resolve_approval_responsesso that only approved tool-call results are emitted asTOOL_CALL_RESULTevents while rejected results are still written into message history. The security-sensitive validation logic (pending_approvals registry, replay prevention, function-name spoofing checks) is unchanged. The two emission sites forToolCallResultEvent(lines 867-877 inside the first-update branch and lines 933-943 inside the no-updates fallback) are mutually exclusive via therun_started_emittedflag. The_replace_approval_contents_with_resultshelper independently generates rejection results via its ownelsebranch (line 1790-1797 in_tools.py), so the rejection items appended toall_resultsare never consumed from the list—this is harmless but slightly wasteful. Tests are updated to use a realFunctionToolwithapproval_mode='always_require'and now assert exact tool output, plus a new test validates rejections produce no events. No injection, deserialization, resource-leak, or unhandled-failure issues found.
✗ Test Coverage
The PR correctly separates approved from rejected results in
_resolve_approval_responsesand adds a newtest_rejection_does_not_emit_tool_call_resulttest. Existing tests are upgraded with real tool execution and exact content assertions, which is a quality improvement. However, there is no test covering the mixed approved+rejected scenario (some calls approved, some rejected in the same stream), and the duplicated TOOL_CALL_RESULT emission code in the no-updates fallback path (lines 933-943) has zero test coverage. Both gaps exercise distinct code paths introduced or modified by this PR.
✓ Design Approach
The design approach is sound: separating approved results (returned for event emission) from rejection results (written to message history only) correctly models the AG-UI protocol requirement that TOOL_CALL_RESULT events should only fire for actually-executed tool calls. The split in
_resolve_approval_responsesbetweenapproved_results/rejection_results/all_resultsis clean and the docstring accurately reflects the contract. One structural concern is that the TOOL_CALL_RESULT emission block is copy-pasted verbatim into two separate code paths (lines 867-877 for the normal stream case, lines 933-943 for the zero-updates fallback). This duplication means any future change to serialization logic or the ToolCallResultEvent shape must be applied in both places, which is a maintenance hazard. Extracting a small helper (e.g.,_emit_approval_tool_results) would eliminate the duplication without any behavioral change.
Flagged Issues
- No test for a mixed approve/reject scenario (e.g., two tool calls where one is approved and one is rejected). This is the main behavioral change—
_resolve_approval_responsesnow splits results into two lists—and it is only tested with all-approved or all-rejected inputs, never a mix.
Suggestions
- The
rejection_resultslist built in_resolve_approval_responses(lines 519-526 of_agent_run.py) is appended toall_resultsand passed to_replace_approval_contents_with_results, but that function never consumes those items—itselsebranch at_tools.py:1790creates its own inline rejection results. Consider passing onlyapproved_resultsto_replace_approval_contents_with_results(or removing therejection_resultsconstruction entirely) to eliminate dead data and clarify the contract. - The TOOL_CALL_RESULT emission block is duplicated verbatim in the first-update path (lines 867-877) and the no-updates fallback path (lines 933-943). Extract it into a small async generator helper (e.g.,
_emit_approval_tool_results(resolved_approval_results)) to keep serialization logic in one place and prevent the two copies from drifting apart. - All existing tests use a StubAgent that always yields updates, so only the first TOOL_CALL_RESULT emission site (lines 867-877) is exercised. Add a test where the agent produces zero updates to cover the no-updates fallback path (lines 933-943).
- Consider adding a direct unit test for
_resolve_approval_responsesthat verifies the return value contains only approved results while rejection results are still written intomessages. The current tests verify this indirectly via event emission, which makes it harder to pinpoint regressions in the splitting logic.
Automated review by moonbox3's agents
- Extract duplicated TOOL_CALL_RESULT emission block into _make_approval_tool_result_events helper to prevent drift - Remove dead rejection_results construction in _resolve_approval_responses; _replace_approval_contents_with_results already handles rejections inline - Pass only approved_results (not all_results) to clarify the contract - Add mixed approve/reject test validating the core splitting logic - Add zero-updates test covering the no-updates fallback emission path - Add direct unit test for _resolve_approval_responses return value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add blank line between first-party and third-party import groups to satisfy ruff I001 rule. Fixes microsoft#4589 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 97% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by moonbox3's agents
Motivation and Context
When a user approved a pending tool call and the agent resumed, the executed tool's result was never emitted as a
TOOL_CALL_RESULTevent in the AG-UI stream. This meant downstream consumers (e.g., CopilotKit) had no visibility into the tool execution outcome, breaking the approval-resume flow.Fixes #4589
Description
The root cause was that
_resolve_approval_responsesexecuted the approved tools and inserted results into the message history, butrun_agent_streamnever yielded correspondingToolCallResultEvents to the event stream. The fix makes_resolve_approval_responsesreturn the list of resolvedContentobjects, andrun_agent_streamiterates over them right afterRUN_STARTEDto emit aToolCallResultEventfor each resolved approval. Three new tests verify that the event is emitted with the correcttool_call_idand content, and that no spurious events appear when no approval is present.Contribution Checklist