Skip to content

Python: Emit AG-UI events for MCP tool calls, results, and text reasoning#4760

Open
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:4213-fix
Open

Python: Emit AG-UI events for MCP tool calls, results, and text reasoning#4760
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:4213-fix

Conversation

@moonbox3
Copy link
Contributor

@moonbox3 moonbox3 commented Mar 18, 2026

Summary

Fixes #4213 — Supersedes #4233

_emit_content() in the AG-UI layer only handled text, function_call, function_result, function_approval_request, usage, and oauth_consent_request content types. Foundry MCP content types (mcp_server_tool_call, mcp_server_tool_result) and text_reasoning fell through unhandled, producing no SSE events for AG-UI consumers.

Changes

Added three new handler functions and wired them into _emit_content():

Content Type Handler AG-UI Events
mcp_server_tool_call _emit_mcp_tool_call TOOL_CALL_START + TOOL_CALL_ARGS
mcp_server_tool_result _emit_mcp_tool_result TOOL_CALL_END + TOOL_CALL_RESULT
text_reasoning _emit_text_reasoning REASONING_STARTREASONING_MESSAGE_STARTREASONING_MESSAGE_CONTENTREASONING_MESSAGE_ENDREASONING_ENCRYPTED_VALUE (when protected_data present) → REASONING_END

Improvements over #4233

This PR incorporates the work from #4233 by @LEDazzio01 and addresses all reviewer feedback:

  • Uses protocol-defined reasoning events instead of CustomEvent fallback (addresses @robinsk's feedback) — CopilotKit and other AG-UI consumers can render reasoning natively
  • Emits ReasoningEncryptedValueEvent for protected_data, keeping encrypted reasoning separate from display text (addresses @robinsk's ReasoningEncryptedValue concern)
  • Correct event constructor argumentsReasoningStartEvent/ReasoningEndEvent receive required message_id, ReasoningMessageStartEvent uses role="assistant" per protocol spec
  • Tests match implementation — all test assertions verify the actual reasoning event types, not stale CustomEvent assertions
  • _emit_mcp_tool_result mirrors _emit_tool_result cleanup — resets tool_call_id/tool_call_name, closes open text messages, resets accumulated_text (addresses Copilot bot review)
  • Logs warning on missing call_id in _emit_mcp_tool_result (addresses @moonbox3's feedback)
  • No Fixes #4213 in docstrings (addresses @moonbox3's feedback)
  • No unused flow param on _emit_text_reasoning (addresses @moonbox3's feedback)

…ning

Fixes microsoft#4213 — `_emit_content()` in the AG-UI layer only handled `text`,
`function_call`, `function_result`, `function_approval_request`, `usage`,
and `oauth_consent_request` content types. Foundry MCP content types
(`mcp_server_tool_call`, `mcp_server_tool_result`) and `text_reasoning`
fell through unhandled, producing no SSE events for AG-UI consumers.

Added three new handler functions wired into `_emit_content()`:

- `_emit_mcp_tool_call`: emits TOOL_CALL_START + TOOL_CALL_ARGS and
  tracks in FlowState for MESSAGES_SNAPSHOT inclusion
- `_emit_mcp_tool_result`: emits TOOL_CALL_END + TOOL_CALL_RESULT with
  full FlowState cleanup mirroring `_emit_tool_result`
- `_emit_text_reasoning`: emits the protocol-defined reasoning event
  sequence (ReasoningStart → MessageStart → MessageContent → MessageEnd
  → ReasoningEnd) with ReasoningEncryptedValueEvent for protected_data
Copilot AI review requested due to automatic review settings March 18, 2026 10:32
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _run_common.py2441095%309–310, 316–321, 531–532
TOTAL27142321888% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5289 20 💤 0 ❌ 0 🔥 1m 23s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing AG-UI SSE event emission for Foundry/Responses streaming content items that previously fell through _emit_content() unhandled, ensuring MCP tool execution and text_reasoning are visible to AG-UI consumers.

Changes:

  • Add _emit_mcp_tool_call() and _emit_mcp_tool_result() to map MCP tool call/result content into standard TOOL_CALL_* AG-UI events and snapshot-tracked FlowState entries.
  • Add _emit_text_reasoning() to emit protocol-defined reasoning events, including encrypted reasoning payload emission when protected_data is present.
  • Add/expand unit tests to cover MCP tool call/result emission and _emit_content() routing for the new content types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_run_common.py Adds new emitters for MCP tool call/result and text_reasoning, and wires them into _emit_content() dispatch.
python/packages/ag-ui/tests/ag_ui/test_run.py Adds test coverage for MCP tool call/result events, reasoning events, and _emit_content() routing for the new content types.

You can also share your feedback on Copilot code review. Take the survey.

Exercises the full POST → SSE bytes → parse → validate pipeline for
mcp_server_tool_call, mcp_server_tool_result, text_reasoning, and
ReasoningEncryptedValueEvent content through FastAPI TestClient.
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 92%

✓ Correctness

The diff adds three new content-type handlers (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and corresponding tests. The implementation correctly mirrors existing patterns for tool call/result and reasoning event emission. All ag_ui.core event types used (ReasoningStartEvent, etc.) exist in the installed ag-ui-protocol package with matching field signatures. FlowState tracking for MCP tool calls follows the same structure as regular tool calls. The Pydantic min_length=1 constraint on ReasoningMessageContentEvent.delta is respected by guarding with if text:. Tests are comprehensive and cover edge cases (missing IDs, empty content, protected_data only). No correctness issues found.

✓ Security Reliability

The diff adds three new event emitter functions (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and corresponding routing in _emit_content, along with comprehensive tests. The new code faithfully mirrors existing patterns: argument serialization via make_json_safe, FlowState cleanup in _emit_mcp_tool_result matching _emit_tool_result, proper fallback ID generation, and early-return guards for missing call_id. No injection risks, resource leaks, unsafe deserialization, or secrets were found. The protected_data field is an opaque passthrough by design. All new code paths are well-tested including edge cases.

✓ Test Coverage

Test coverage for the new MCP tool call, MCP tool result, and text reasoning features is thorough. Unit tests cover the main happy paths, edge cases (missing call_id, missing server_name, missing arguments, generated IDs), flow state tracking, and FlowState cleanup symmetry with the existing _emit_tool_result. SSE round-trip integration tests verify events survive encoding/parsing. Two minor gaps exist: (1) no test for _emit_mcp_tool_result when output is None (the implementation defaults to empty string, but this path is not exercised), and (2) no test for the fallback tool_name="mcp_tool" when tool_name is None in _emit_mcp_tool_call. Neither gap is blocking since the implementation handles both cases and they are unlikely failure paths via normal factory usage.

✗ Design Approach

The MCP tool call/result and text reasoning handlers are well-structured and the test coverage is thorough. One functional omission stands out: _emit_mcp_tool_result does not accept or forward the predictive_handler parameter, so MCP tool results will never trigger the apply_pending_updates() call or emit a StateSnapshotEvent, unlike the equivalent _emit_tool_result path. This silently breaks state continuity for any agent that uses predictive state updates. Additionally, _emit_mcp_tool_call does not populate flow.tool_call_id/flow.tool_call_name, yet _emit_mcp_tool_result unconditionally clears them — if a standard function_call is being tracked in those fields when an MCP result arrives, the FlowState is corrupted, though this edge case may not occur in practice given how tools are sequenced.

Flagged Issues

  • _emit_mcp_tool_result silently drops the predictive_handler: it has no parameter for it and _emit_content does not pass it (unlike _emit_tool_result and _emit_approval_request). Any agent using predictive state will never receive a StateSnapshotEvent after an MCP tool result, causing stale client-side state. Fix: add predictive_handler: PredictiveStateHandler | None = None to _emit_mcp_tool_result, mirror the apply_pending_updates() + StateSnapshotEvent block from _emit_tool_result, and forward the handler in _emit_content.

Suggestions

  • _emit_mcp_tool_call never sets flow.tool_call_id or flow.tool_call_name, but _emit_mcp_tool_result unconditionally resets both to None. If a function_call previously wrote those fields and an MCP result arrives first, they are cleared prematurely. For consistency, _emit_mcp_tool_call should set flow.tool_call_id = tool_call_id and flow.tool_call_name = display_name, mirroring _emit_tool_call.
  • Add a test for _emit_mcp_tool_result when output is None to verify the empty-string default: Content.from_mcp_server_tool_result(call_id="x") and assert result_event.content == "".
  • Add a test for the fallback tool_name when tool_name is None (Content(type="mcp_server_tool_call", call_id="x")) to verify it falls back to "mcp_tool".
  • In TestEmitMcpToolCall.test_tracks_in_flow_state, also assert the stored arguments value (flow.tool_calls_by_id["mcp_call_2"]["function"]["arguments"]) to verify string-typed arguments are stored verbatim.

Automated review by moonbox3's agents

Copilot and others added 2 commits March 18, 2026 10:50
…ft#4213)

- Add predictive_handler parameter to _emit_mcp_tool_result and mirror
  the apply_pending_updates + StateSnapshotEvent block from _emit_tool_result
- Forward predictive_handler from _emit_content to _emit_mcp_tool_result
- Add assertion for stored arguments in MCP tool call test
- Add test for predictive handler state snapshot after MCP tool result

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@TaoChenOSU TaoChenOSU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90%

✓ Correctness

The diff adds three new content-type handlers (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and comprehensive tests. The implementations correctly mirror existing patterns (e.g., _emit_tool_call / _emit_tool_result), properly handle edge cases (missing call_id, missing arguments, empty text), and respect the ag-ui-protocol event type contracts. The ReasoningMessageContentEvent's min_length=1 constraint on delta is correctly guarded by the if text: check. FlowState cleanup in _emit_mcp_tool_result faithfully mirrors _emit_tool_result. Tests cover all branches including routing through _emit_content. No correctness issues found.

✓ Security Reliability

The diff adds three new event emitters (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) and their routing in _emit_content, plus comprehensive tests. The new code closely mirrors existing patterns (e.g., _emit_tool_call / _emit_tool_result) for argument serialization via make_json_safe, call_id validation, FlowState cleanup, and predictive handler integration. No injection risks, unsafe deserialization, secrets, or resource leaks were found. All externally-sourced values are passed as typed fields to Pydantic event models rather than interpolated into commands or queries. Input validation is present (missing call_id returns early in _emit_mcp_tool_result). The protected_data pass-through is by design—the field carries opaque encrypted data and the encryption responsibility lies with producers.

✓ Test Coverage

Test coverage for the three new emit functions (_emit_mcp_tool_call, _emit_mcp_tool_result, _emit_text_reasoning) is thorough. Unit tests cover happy paths, edge cases (missing call_id, missing server_name, missing arguments, empty text, generated IDs), flow state mutations, JSON serialization of non-string values, predictive handler integration, and flow cleanup parity with _emit_tool_result. Routing through _emit_content is verified for all three new content types. Full HTTP SSE round-trip tests cover MCP tool calls, reasoning events, and encrypted reasoning. Two minor edge cases lack explicit tests: the tool_name fallback to 'mcp_tool' when None, and None output in _emit_mcp_tool_result defaulting to empty string. These are defensive fallbacks unlikely to occur through normal factory methods, so they are non-blocking suggestions only.

✗ Design Approach

The implementation is functionally correct and well-tested, but has two design concerns worth noting. The most significant is that _emit_mcp_tool_result is a near-verbatim copy of the existing _emit_tool_result, differing only in content.output vs content.result. The code even comments that it "Mirrors the FlowState cleanup performed by _emit_tool_result", acknowledging the duplication. This creates a maintenance trap: future bug-fixes in _emit_tool_result's flow-state cleanup (like the #3568 fix already present) will not automatically propagate. The second concern is a leaky abstraction: embedding server_name into the AG-UI tool_call_name string (e.g. "brave/search") bleeds MCP infrastructure details into the AG-UI protocol layer, forcing downstream consumers to parse a /-delimited string to recover the server origin.

Flagged Issues

  • _emit_mcp_tool_result duplicates ~40 lines of _emit_tool_result's logic, differing only in reading content.output instead of content.result. Extract a shared private helper (e.g. _emit_tool_result_common(call_id, raw_result, flow, predictive_handler)) and call it from both functions, or normalize the Content field so both result types use the same attribute. As-is, any future fix to _emit_tool_result's flow-state cleanup (like the existing #3568 fix) must be manually propagated to _emit_mcp_tool_result.

Suggestions

  • Embedding server_name into tool_call_name via f"{server_name}/{tool_name}" is a leaky abstraction: MCP-specific routing context bleeds into the AG-UI protocol layer, and '/' is not a reserved separator in tool names so this can be ambiguous (e.g. server_name containing '/'). Consider using just tool_name for tool_call_name and carrying server_name as structured data in the ToolCallArgsEvent payload or a companion CustomEvent. If the current encoding is kept, validate or sanitize server_name.
  • _emit_mcp_tool_call does not include the duplicate-argument-replay guard that _emit_tool_call has (the #4194 fix). The docstring notes MCP calls arrive as complete items, but if a provider ever replays them, arguments could double in MESSAGES_SNAPSHOT. A brief comment noting this intentional omission would help future maintainers.
  • Add unit tests for two untested defensive fallbacks: (1) _emit_mcp_tool_call when tool_name is None, verifying the fallback to 'mcp_tool' display name; (2) _emit_mcp_tool_result when output is None, verifying the result defaults to an empty string.
  • Consider testing that _emit_mcp_tool_call correctly passes flow.message_id as parent_message_id on the ToolCallStartEvent, to verify the parent-child relationship is preserved.

Automated review by TaoChenOSU's agents

Copilot and others added 2 commits March 18, 2026 22:22
- Extract _emit_tool_result_common shared helper to eliminate duplication
  between _emit_tool_result and _emit_mcp_tool_result
- Remove server_name prefix from tool_call_name in _emit_mcp_tool_call;
  display_name now equals tool_name directly
- Add test for tool_name fallback to 'mcp_tool' when tool_name is None
- Add test for output=None fallback to empty string in _emit_mcp_tool_result

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@moonbox3 moonbox3 requested a review from TaoChenOSU March 18, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: MCP Tool Calls + Reasoning Not Emitted as AG-UI Events (Foundry)

5 participants