Python: Emit AG-UI events for MCP tool calls, results, and text reasoning#4760
Python: Emit AG-UI events for MCP tool calls, results, and text reasoning#4760moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
Conversation
…ning Fixes microsoft#4213 — `_emit_content()` in the AG-UI layer only handled `text`, `function_call`, `function_result`, `function_approval_request`, `usage`, and `oauth_consent_request` content types. Foundry MCP content types (`mcp_server_tool_call`, `mcp_server_tool_result`) and `text_reasoning` fell through unhandled, producing no SSE events for AG-UI consumers. Added three new handler functions wired into `_emit_content()`: - `_emit_mcp_tool_call`: emits TOOL_CALL_START + TOOL_CALL_ARGS and tracks in FlowState for MESSAGES_SNAPSHOT inclusion - `_emit_mcp_tool_result`: emits TOOL_CALL_END + TOOL_CALL_RESULT with full FlowState cleanup mirroring `_emit_tool_result` - `_emit_text_reasoning`: emits the protocol-defined reasoning event sequence (ReasoningStart → MessageStart → MessageContent → MessageEnd → ReasoningEnd) with ReasoningEncryptedValueEvent for protected_data
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Adds missing AG-UI SSE event emission for Foundry/Responses streaming content items that previously fell through _emit_content() unhandled, ensuring MCP tool execution and text_reasoning are visible to AG-UI consumers.
Changes:
- Add
_emit_mcp_tool_call()and_emit_mcp_tool_result()to map MCP tool call/result content into standardTOOL_CALL_*AG-UI events and snapshot-tracked FlowState entries. - Add
_emit_text_reasoning()to emit protocol-defined reasoning events, including encrypted reasoning payload emission whenprotected_datais present. - Add/expand unit tests to cover MCP tool call/result emission and
_emit_content()routing for the new content types.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| python/packages/ag-ui/agent_framework_ag_ui/_run_common.py | Adds new emitters for MCP tool call/result and text_reasoning, and wires them into _emit_content() dispatch. |
| python/packages/ag-ui/tests/ag_ui/test_run.py | Adds test coverage for MCP tool call/result events, reasoning events, and _emit_content() routing for the new content types. |
You can also share your feedback on Copilot code review. Take the survey.
Exercises the full POST → SSE bytes → parse → validate pipeline for mcp_server_tool_call, mcp_server_tool_result, text_reasoning, and ReasoningEncryptedValueEvent content through FastAPI TestClient.
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 92%
✓ Correctness
The diff adds three new content-type handlers (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and corresponding tests. The implementation correctly mirrors existing patterns for tool call/result and reasoning event emission. All ag_ui.core event types used (ReasoningStartEvent, etc.) exist in the installed ag-ui-protocol package with matching field signatures. FlowState tracking for MCP tool calls follows the same structure as regular tool calls. The Pydantic min_length=1 constraint on ReasoningMessageContentEvent.delta is respected by guarding with
if text:. Tests are comprehensive and cover edge cases (missing IDs, empty content, protected_data only). No correctness issues found.
✓ Security Reliability
The diff adds three new event emitter functions (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and corresponding routing in _emit_content, along with comprehensive tests. The new code faithfully mirrors existing patterns: argument serialization via make_json_safe, FlowState cleanup in _emit_mcp_tool_result matching _emit_tool_result, proper fallback ID generation, and early-return guards for missing call_id. No injection risks, resource leaks, unsafe deserialization, or secrets were found. The protected_data field is an opaque passthrough by design. All new code paths are well-tested including edge cases.
✓ Test Coverage
Test coverage for the new MCP tool call, MCP tool result, and text reasoning features is thorough. Unit tests cover the main happy paths, edge cases (missing call_id, missing server_name, missing arguments, generated IDs), flow state tracking, and FlowState cleanup symmetry with the existing _emit_tool_result. SSE round-trip integration tests verify events survive encoding/parsing. Two minor gaps exist: (1) no test for _emit_mcp_tool_result when output is None (the implementation defaults to empty string, but this path is not exercised), and (2) no test for the fallback tool_name="mcp_tool" when tool_name is None in _emit_mcp_tool_call. Neither gap is blocking since the implementation handles both cases and they are unlikely failure paths via normal factory usage.
✗ Design Approach
The MCP tool call/result and text reasoning handlers are well-structured and the test coverage is thorough. One functional omission stands out:
_emit_mcp_tool_resultdoes not accept or forward thepredictive_handlerparameter, so MCP tool results will never trigger theapply_pending_updates()call or emit aStateSnapshotEvent, unlike the equivalent_emit_tool_resultpath. This silently breaks state continuity for any agent that uses predictive state updates. Additionally,_emit_mcp_tool_calldoes not populateflow.tool_call_id/flow.tool_call_name, yet_emit_mcp_tool_resultunconditionally clears them — if a standardfunction_callis being tracked in those fields when an MCP result arrives, the FlowState is corrupted, though this edge case may not occur in practice given how tools are sequenced.
Flagged Issues
-
_emit_mcp_tool_resultsilently drops thepredictive_handler: it has no parameter for it and_emit_contentdoes not pass it (unlike_emit_tool_resultand_emit_approval_request). Any agent using predictive state will never receive aStateSnapshotEventafter an MCP tool result, causing stale client-side state. Fix: addpredictive_handler: PredictiveStateHandler | None = Noneto_emit_mcp_tool_result, mirror theapply_pending_updates()+StateSnapshotEventblock from_emit_tool_result, and forward the handler in_emit_content.
Suggestions
_emit_mcp_tool_callnever setsflow.tool_call_idorflow.tool_call_name, but_emit_mcp_tool_resultunconditionally resets both toNone. If afunction_callpreviously wrote those fields and an MCP result arrives first, they are cleared prematurely. For consistency,_emit_mcp_tool_callshould setflow.tool_call_id = tool_call_idandflow.tool_call_name = display_name, mirroring_emit_tool_call.- Add a test for
_emit_mcp_tool_resultwhen output isNoneto verify the empty-string default:Content.from_mcp_server_tool_result(call_id="x")and assertresult_event.content == "". - Add a test for the fallback
tool_namewhentool_nameisNone(Content(type="mcp_server_tool_call", call_id="x")) to verify it falls back to"mcp_tool". - In
TestEmitMcpToolCall.test_tracks_in_flow_state, also assert the stored arguments value (flow.tool_calls_by_id["mcp_call_2"]["function"]["arguments"]) to verify string-typed arguments are stored verbatim.
Automated review by moonbox3's agents
…ft#4213) - Add predictive_handler parameter to _emit_mcp_tool_result and mirror the apply_pending_updates + StateSnapshotEvent block from _emit_tool_result - Forward predictive_handler from _emit_content to _emit_mcp_tool_result - Add assertion for stored arguments in MCP tool call test - Add test for predictive handler state snapshot after MCP tool result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TaoChenOSU
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
The diff adds three new content-type handlers (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and comprehensive tests. The implementations correctly mirror existing patterns (e.g., _emit_tool_call / _emit_tool_result), properly handle edge cases (missing call_id, missing arguments, empty text), and respect the ag-ui-protocol event type contracts. The ReasoningMessageContentEvent's min_length=1 constraint on delta is correctly guarded by the
if text:check. FlowState cleanup in _emit_mcp_tool_result faithfully mirrors _emit_tool_result. Tests cover all branches including routing through _emit_content. No correctness issues found.
✓ Security Reliability
The diff adds three new event emitters (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and their routing in _emit_content, plus comprehensive tests. The new code closely mirrors existing patterns (e.g., _emit_tool_call / _emit_tool_result) for argument serialization via make_json_safe, call_id validation, FlowState cleanup, and predictive handler integration. No injection risks, unsafe deserialization, secrets, or resource leaks were found. All externally-sourced values are passed as typed fields to Pydantic event models rather than interpolated into commands or queries. Input validation is present (missing call_id returns early in _emit_mcp_tool_result). The protected_data pass-through is by design—the field carries opaque encrypted data and the encryption responsibility lies with producers.
✓ Test Coverage
Test coverage for the three new emit functions (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) is thorough. Unit tests cover happy paths, edge cases (missing call_id, missing server_name, missing arguments, empty text, generated IDs), flow state mutations, JSON serialization of non-string values, predictive handler integration, and flow cleanup parity with _emit_tool_result. Routing through _emit_content is verified for all three new content types. Full HTTP SSE round-trip tests cover MCP tool calls, reasoning events, and encrypted reasoning. Two minor edge cases lack explicit tests: the tool_name fallback to 'mcp_tool' when None, and None output in _emit_mcp_tool_result defaulting to empty string. These are defensive fallbacks unlikely to occur through normal factory methods, so they are non-blocking suggestions only.
✗ Design Approach
The implementation is functionally correct and well-tested, but has two design concerns worth noting. The most significant is that
_emit_mcp_tool_resultis a near-verbatim copy of the existing_emit_tool_result, differing only incontent.outputvscontent.result. The code even comments that it "Mirrors the FlowState cleanup performed by_emit_tool_result", acknowledging the duplication. This creates a maintenance trap: future bug-fixes in_emit_tool_result's flow-state cleanup (like the #3568 fix already present) will not automatically propagate. The second concern is a leaky abstraction: embeddingserver_nameinto the AG-UItool_call_namestring (e.g."brave/search") bleeds MCP infrastructure details into the AG-UI protocol layer, forcing downstream consumers to parse a/-delimited string to recover the server origin.
Flagged Issues
- _emit_mcp_tool_result duplicates ~40 lines of _emit_tool_result's logic, differing only in reading content.output instead of content.result. Extract a shared private helper (e.g. _emit_tool_result_common(call_id, raw_result, flow, predictive_handler)) and call it from both functions, or normalize the Content field so both result types use the same attribute. As-is, any future fix to _emit_tool_result's flow-state cleanup (like the existing #3568 fix) must be manually propagated to _emit_mcp_tool_result.
Suggestions
- Embedding server_name into tool_call_name via f"{server_name}/{tool_name}" is a leaky abstraction: MCP-specific routing context bleeds into the AG-UI protocol layer, and '/' is not a reserved separator in tool names so this can be ambiguous (e.g. server_name containing '/'). Consider using just tool_name for tool_call_name and carrying server_name as structured data in the ToolCallArgsEvent payload or a companion CustomEvent. If the current encoding is kept, validate or sanitize server_name.
- _emit_mcp_tool_call does not include the duplicate-argument-replay guard that _emit_tool_call has (the #4194 fix). The docstring notes MCP calls arrive as complete items, but if a provider ever replays them, arguments could double in MESSAGES_SNAPSHOT. A brief comment noting this intentional omission would help future maintainers.
- Add unit tests for two untested defensive fallbacks: (1) _emit_mcp_tool_call when tool_name is None, verifying the fallback to 'mcp_tool' display name; (2) _emit_mcp_tool_result when output is None, verifying the result defaults to an empty string.
- Consider testing that _emit_mcp_tool_call correctly passes flow.message_id as parent_message_id on the ToolCallStartEvent, to verify the parent-child relationship is preserved.
Automated review by TaoChenOSU's agents
- Extract _emit_tool_result_common shared helper to eliminate duplication between _emit_tool_result and _emit_mcp_tool_result - Remove server_name prefix from tool_call_name in _emit_mcp_tool_call; display_name now equals tool_name directly - Add test for tool_name fallback to 'mcp_tool' when tool_name is None - Add test for output=None fallback to empty string in _emit_mcp_tool_result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Fixes #4213 — Supersedes #4233
_emit_content()in the AG-UI layer only handledtext,function_call,function_result,function_approval_request,usage, andoauth_consent_requestcontent types. Foundry MCP content types (mcp_server_tool_call,mcp_server_tool_result) andtext_reasoningfell through unhandled, producing no SSE events for AG-UI consumers.Changes
Added three new handler functions and wired them into
_emit_content():mcp_server_tool_call_emit_mcp_tool_callTOOL_CALL_START+TOOL_CALL_ARGSmcp_server_tool_result_emit_mcp_tool_resultTOOL_CALL_END+TOOL_CALL_RESULTtext_reasoning_emit_text_reasoningREASONING_START→REASONING_MESSAGE_START→REASONING_MESSAGE_CONTENT→REASONING_MESSAGE_END→REASONING_ENCRYPTED_VALUE(when protected_data present) →REASONING_ENDImprovements over #4233
This PR incorporates the work from #4233 by @LEDazzio01 and addresses all reviewer feedback:
CustomEventfallback (addresses @robinsk's feedback) — CopilotKit and other AG-UI consumers can render reasoning nativelyReasoningEncryptedValueEventforprotected_data, keeping encrypted reasoning separate from display text (addresses @robinsk'sReasoningEncryptedValueconcern)ReasoningStartEvent/ReasoningEndEventreceive requiredmessage_id,ReasoningMessageStartEventusesrole="assistant"per protocol specCustomEventassertions_emit_mcp_tool_resultmirrors_emit_tool_resultcleanup — resetstool_call_id/tool_call_name, closes open text messages, resetsaccumulated_text(addresses Copilot bot review)call_idin_emit_mcp_tool_result(addresses @moonbox3's feedback)Fixes #4213in docstrings (addresses @moonbox3's feedback)flowparam on_emit_text_reasoning(addresses @moonbox3's feedback)