Skip to content

Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746

Open
eavanvalkenburg wants to merge 6 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_4710
Open

Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746
eavanvalkenburg wants to merge 6 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_4710

Conversation

@eavanvalkenburg
Copy link
Member

Motivation and Context

Fixes #4710.

This change clarifies the separation of concerns between agent middleware, chat middleware, function invocation, and telemetry. Chat middleware now runs for each inner model call in the function loop while remaining outside telemetry so middleware latency does not skew per-call timings. It also covers the related per-call usage-tracking scenario discussed in #4671 and aligns Anthropic with the raw/public client pattern used elsewhere.

Description

  • reorder layered chat clients so the public stack is FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base client
  • move chat pipeline construction/caching into ChatMiddlewareLayer and add matching cache reuse for agent and function middleware pipelines
  • standardize mixed per-call middleware routing on client_kwargs["middleware"] and remove stale function_middleware docstring/runtime references
  • add and update middleware/observability samples and docs to reflect the new behavior, including a usage-tracking sample that shows per-call middleware behavior in streaming and non-streaming runs
  • split Anthropic into RawAnthropicClient plus public AnthropicClient, and include the small ancillary typing cleanups already present on the branch

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

eavanvalkenburg and others added 2 commits March 17, 2026 17:09
Reorder chat client layers so function invocation wraps chat middleware, and chat middleware stays outside telemetry while still running for each inner model call. Add middleware pipeline caching, refresh docs and samples, and split Anthropic into raw and public clients to match the standard layering model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add targeted typing ignores in workflow visualization and lab modules so pyright stays clean alongside the middleware refactor work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 17, 2026 16:14
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python lab Agent Framework Lab labels Mar 17, 2026
@github-actions github-actions bot changed the title [BREAKING] Refactor middleware layering and split Anthropic raw client Python: [BREAKING] Refactor middleware layering and split Anthropic raw client Mar 17, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 17, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/ag-ui/agent_framework_ag_ui
   _client.py1471391%85–86, 90–94, 98–102, 262
packages/anthropic/agent_framework_anthropic
   _chat_client.py4313591%435, 438, 519, 606, 608, 758, 785–786, 864, 894–895, 940, 956–957, 964–966, 970–972, 976–979, 1093, 1103, 1155, 1276, 1303–1304, 1321, 1334, 1347, 1372–1373
packages/azure-ai/agent_framework_azure_ai
   _chat_client.py4897884%415–416, 418, 602, 607–608, 610–611, 614, 617, 619, 624, 885–886, 888, 891, 894, 897–902, 905, 907, 915, 927–929, 933, 936–937, 945–948, 958, 966–969, 971–972, 974–975, 982, 990–991, 999–1012, 1017, 1020, 1028, 1034, 1042–1044, 1047, 1067–1068, 1201, 1228, 1243, 1359, 1409, 1415
   _client.py4054987%214, 392, 394, 442, 452–456, 458–464, 477, 538, 553–558, 600, 616–617, 643–644, 662, 697, 699, 720–721, 724, 769, 772, 774, 865, 896, 938, 1147, 1150, 1153–1154, 1156–1159
packages/core/agent_framework
   _clients.py145795%339, 391, 554–557, 666
   _middleware.py3661695%61, 64, 69, 794, 810, 812, 814, 947, 950, 977, 979, 1113, 1117, 1299, 1303, 1377
   _tools.py9428590%190–191, 366, 368, 381, 410–412, 420, 438, 452, 459, 466, 483, 485, 492, 500, 535, 584, 588, 631–633, 635, 641, 686–688, 690, 713, 739, 781–783, 787, 809, 921–927, 963, 975, 977, 979, 982–985, 1006, 1010, 1014, 1028–1030, 1371, 1393, 1480–1486, 1615, 1619, 1665, 1784, 1814, 1834, 1836, 1892, 1955, 2138–2139, 2206–2207, 2260, 2331–2332, 2394, 2399, 2406
   observability.py6763195%374–375, 402, 404–406, 409–411, 416–417, 423–424, 430–431, 721, 725–727, 729, 1072–1076, 1292–1293, 1539, 1721, 2105, 2107
packages/core/agent_framework/_workflows
   _viz.py1751492%102–103, 117–119, 122–123, 126, 133, 153, 166, 179, 357, 418
packages/core/agent_framework/azure
   _chat_client.py87396%322–323, 334
   _responses_client.py54688%220, 268, 274–277
packages/core/agent_framework/openai
   _assistants_client.py3293689%422, 424, 426, 429, 433–434, 437, 440, 445–446, 448, 451–453, 458, 469, 494, 496, 498, 500, 502, 507, 510, 513, 517, 528, 723, 809, 812, 831, 842, 879–882, 952
   _chat_client.py3322891%212, 293–294, 298, 423, 430, 506–513, 515–518, 528, 606, 608, 625, 646, 674, 687, 711, 727, 841
   _responses_client.py82112684%312–315, 319–320, 325–326, 336–337, 344, 359–365, 386, 394, 417, 514, 613, 672, 674, 676, 678, 746, 760, 840, 850, 855, 898, 977, 994, 1007, 1072, 1165, 1170, 1174–1176, 1180–1181, 1247, 1276, 1282, 1292, 1298, 1303, 1309, 1314–1315, 1376, 1398–1399, 1414–1415, 1433–1434, 1475–1478, 1640, 1695, 1697, 1777–1785, 1907, 1962, 1977, 1997–2007, 2020, 2031–2035, 2049, 2063–2074, 2083, 2115–2118, 2126–2127, 2129–2131, 2145–2147, 2157–2158, 2164, 2179
TOTAL24039263989% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5265 20 💤 0 ❌ 0 🔥 1m 24s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Python chat-client layer stack so chat middleware executes around each inner model call in the tool loop (without inflating telemetry spans), standardizes per-call middleware routing via client_kwargs["middleware"], and aligns Anthropic with the Raw...Client/public client split used by other providers.

Changes:

  • Reorders the standard public client MRO to FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base client across providers, tests, and docs.
  • Moves chat/function/agent middleware pipeline construction + reuse into the relevant layers (pipeline caching).
  • Splits Anthropic into RawAnthropicClient + AnthropicClient, and updates docs/samples (including a per-call usage tracking middleware sample).

Reviewed changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/samples/02-agents/providers/custom/README.md Updates recommended layer ordering and composition example.
python/samples/02-agents/observability/advanced_zero_code.py Clarifies telemetry vs middleware placement and span behavior.
python/samples/02-agents/observability/advanced_manual_setup_console_output.py Updates observability guidance to the new default layer order.
python/samples/02-agents/middleware/usage_tracking_middleware.py Adds a sample showing per-call usage tracking for streaming and non-streaming tool loops.
python/samples/02-agents/middleware/agent_and_run_level_middleware.py Updates middleware ordering explanation to reflect per-call chat middleware behavior.
python/samples/02-agents/middleware/README.md Adds index + instructions for running the new usage tracking sample.
python/samples/02-agents/chat_client/custom_chat_client.py Updates sample to recommend and demonstrate the new layer order.
python/samples/02-agents/auto_retry.py Updates retry middleware docs to reflect “per model call” semantics in tool loops.
python/packages/orchestrations/tests/test_handoff.py Updates mock client MRO ordering to match the new layering.
python/packages/ollama/agent_framework_ollama/_chat_client.py Reorders Ollama client inheritance to match the standard layer stack.
python/packages/lab/lightning/agent_framework_lab_lightning/init.py Adjusts typing ignores around optional agentlightning dependency.
python/packages/lab/gaia/agent_framework_lab_gaia/gaia.py Adds typing ignore for optional pyarrow import.
python/packages/foundry_local/agent_framework_foundry_local/_foundry_local_client.py Reorders FoundryLocal client inheritance to match the standard layer stack.
python/packages/core/tests/core/test_observability.py Updates span-ordering test expectations + mock client MRO.
python/packages/core/tests/core/test_middleware_with_chat.py Updates per-call middleware passing to client_kwargs["middleware"] and adds pipeline cache tests.
python/packages/core/tests/core/test_middleware_with_agent.py Adds agent middleware pipeline cache tests and tool-loop ordering coverage.
python/packages/core/tests/core/test_kwargs_propagation_to_ai_function.py Reorders mock client inheritance to match the new standard stack.
python/packages/core/tests/core/test_function_invocation_logic.py Switches per-call middleware usage to client_kwargs["middleware"] and updates mock init args.
python/packages/core/tests/core/test_clients.py Updates docstring/signature expectations (removal of function_middleware references).
python/packages/core/tests/core/conftest.py Updates mock client initialization to use middleware=[] and new MRO order.
python/packages/core/agent_framework/openai/_responses_client.py Updates OpenAI Responses client MRO + raw-client layering docstring.
python/packages/core/agent_framework/openai/_chat_client.py Updates OpenAI Chat client MRO + removes function_middleware param and routes middleware via client_kwargs.
python/packages/core/agent_framework/openai/_assistants_client.py Updates assistants client MRO ordering to the new standard stack.
python/packages/core/agent_framework/observability.py Adds typing ignores around optional OpenTelemetry exporter imports.
python/packages/core/agent_framework/azure/_responses_client.py Updates Azure responses client MRO ordering.
python/packages/core/agent_framework/azure/_chat_client.py Updates Azure chat client MRO ordering.
python/packages/core/agent_framework/_workflows/_viz.py Adds typing ignores around optional graphviz usage.
python/packages/core/agent_framework/_tools.py Refactors FunctionInvocationLayer to route middleware via client_kwargs["middleware"] and cache function middleware pipelines.
python/packages/core/agent_framework/_middleware.py Adds pipeline .matches() and caching for chat/function/agent middleware pipelines; adjusts ChatMiddlewareLayer init/signature.
python/packages/core/agent_framework/_clients.py Updates layered docstring composition to reflect new layer responsibilities and kwarg docs.
python/packages/bedrock/agent_framework_bedrock/_chat_client.py Reorders Bedrock client inheritance to match the standard layer stack.
python/packages/azure-ai/tests/test_azure_ai_agent_client.py Updates test fixtures for new middleware fields/caches.
python/packages/azure-ai/agent_framework_azure_ai/_client.py Updates raw-client layering docstring + public client MRO.
python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py Updates Azure AI agent client MRO ordering.
python/packages/anthropic/tests/test_anthropic_client.py Updates tests for RawAnthropicClient split and standard layer stack.
python/packages/anthropic/agent_framework_anthropic/_chat_client.py Introduces RawAnthropicClient and composes AnthropicClient with standard layering.
python/packages/anthropic/agent_framework_anthropic/init.py Exports RawAnthropicClient.
python/packages/ag-ui/tests/ag_ui/conftest.py Updates streaming stub MRO and init arg (middleware=[]).
python/packages/ag-ui/agent_framework_ag_ui/_client.py Updates AG-UI chat client MRO ordering.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Member Author

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 88%

✓ Correctness

This is a well-structured refactoring that reorders the MRO so FunctionInvocationLayer sits above ChatMiddlewareLayer, moves middleware categorization responsibility into FunctionInvocationLayer, adds single-slot pipeline caching, splits the Anthropic client into Raw and full-featured classes, and removes the public function_middleware parameter in favor of unified middleware. The code paths are internally consistent: FunctionInvocationLayer.init and get_response correctly categorize mixed middleware, keep function middleware for itself, and forward chat middleware via kwargs; ChatMiddlewareLayer correctly receives only chat middleware; the pipeline caching correctly uses identity-based tuple comparison via matches(). Tests thoroughly cover the new layer ordering, caching, and runtime middleware splitting. No correctness bugs found.

✓ Security Reliability

This is a large structural refactoring that reorders the MRO for all chat client classes, splits middleware routing so FunctionInvocationLayer owns the combined middleware parameter, adds single-slot pipeline caching, and introduces RawAnthropicClient. From a security/reliability perspective, the changes are sound: no injection risks, no secrets exposure, no unsafe deserialization. The caching logic is safe for single-threaded async (no await between check and set). The one robustness concern is that categorize_middleware only checks isinstance(source, list) when unpacking sequences, but the new FunctionInvocationLayer.__init__ and get_response now pass whole sequences (not unpacked) — if a caller passes a tuple (valid Sequence) instead of a list, the middleware items would not be categorized correctly and would silently fall through to the wrong bucket or fail classification.

✓ Design Approach

The diff correctly inverts the layer order from ChatMiddlewareLayer → FunctionInvocationLayer → ChatTelemetryLayer to FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer so that chat middleware runs once per model call (inside the tool loop) rather than once per entire tool-call chain. The change is well-motivated and consistently applied across all clients. The RawAnthropicClient split follows the existing RawOpenAI*Client pattern. The per-pipeline caching with matches() is a reasonable optimization. One structural concern in the new test is that MRO index assertions are fragile: they encode exact positional information about the class hierarchy rather than the behavioral ordering property being tested. A second, minor concern is that the single-slot matches() cache uses == (value equality) — if any middleware object ever defines a custom __eq__, two structurally different pipelines could incorrectly share a cached instance. Neither issue is blocking, but the MRO test pattern is worth addressing proactively given that similar clients (e.g., AzureOpenAIChatClient) already have extra mixins that would shift those indices.

Suggestions

  • New call sites in FunctionInvocationLayer.__init__ and get_response pass the middleware sequence directly to categorize_middleware(middleware) instead of unpacking it (*(middleware or [])). Since categorize_middleware only checks isinstance(source, list), a tuple (a valid Sequence) would be appended as a single item and silently misclassified. Either unpack at call sites (categorize_middleware(*(middleware or []))) or broaden the check in categorize_middleware to handle all Sequence types (e.g., isinstance(source, (list, tuple)) or collections.abc.Sequence with a str exclusion).
  • The single-slot pipeline caches (_cached_chat_middleware_pipeline, etc.) are adequate for the common case but will thrash if concurrent async tasks use different per-call middleware, and would race under multi-threaded usage. Since the worst case is pipeline recreation (not incorrectness), consider adding a brief code comment documenting the single-threaded async assumption for future readers.
  • In test_anthropic_client_wraps_raw_client_with_standard_layer_order, prefer relative ordering checks via mro.index() over hardcoded MRO indices. Clients like AzureOpenAIChatClient already have extra mixins that would shift indices, so asserting mro.index(FunctionInvocationLayer) < mro.index(ChatMiddlewareLayer) < mro.index(ChatTelemetryLayer) < mro.index(RawAnthropicClient) expresses the intent without breaking when a new mixin is inserted.
  • The matches() method uses self._source_middleware == tuple(middleware), which relies on the default identity-based __eq__ of middleware objects. This is correct for all current middleware, but a brief comment explaining the assumption would prevent incorrect cache reuse if a future middleware class defines value-based __eq__.

Automated review by eavanvalkenburg's agents

…RO assertions

- Broaden isinstance check in categorize_middleware from list to Sequence
  so tuples and other Sequence types are properly unpacked instead of
  being appended as a single item.
- Replace fragile hardcoded MRO index assertions in anthropic test with
  relative ordering via mro.index().
- Add regression tests for categorize_middleware with tuple, list, and
  None inputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 87%

✗ Security Reliability

This PR refactors the MRO ordering of middleware layers (swapping FunctionInvocationLayer above ChatMiddlewareLayer), extracts a RawAnthropicClient, adds pipeline caching for middleware, and moves middleware routing responsibilities between layers. The changes are largely structural and well-tested. One reliability concern: the categorize_middleware change from isinstance(source, list) to isinstance(source, Sequence) will also match str and bytes, which are sequences of characters/ints — if a string were ever passed, it would be silently decomposed character-by-character into the middleware list. While type annotations make this unlikely at runtime, a defensive guard is warranted. The pipeline caching is single-slot (one cached value per client instance) which is correct for the common case but worth noting. No security issues (no secrets, no injection risks, no unsafe deserialization) were found.

✓ Test Coverage

This is a large refactoring that reorders the MRO so FunctionInvocationLayer comes before ChatMiddlewareLayer, extracts RawAnthropicClient, adds middleware pipeline caching, and moves function-middleware routing into FunctionInvocationLayer. The core behavioral changes are well-tested: new integration tests verify the tool-loop middleware ordering, pipeline caching is covered for all three pipeline types, and categorize_middleware's Sequence support is tested. However, the MRO reordering was applied to ~12 client classes but only the Anthropic client has an explicit MRO ordering test; the other reordered clients (OpenAI, Azure, Bedrock, Ollama, FoundryLocal, etc.) lack similar guards. RawAnthropicClient is newly exported but has no test exercising it independently.

✗ Design Approach

The layer reordering (FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer) is correct and a genuine semantic improvement: chat middleware now executes per model-call within the tool loop rather than wrapping the entire loop. However, the PR has one blocking design issue: removing middleware= as a first-class named parameter from FunctionInvocationLayer.get_response creates a provider-inconsistent API. OpenAIChatClient re-adds middleware= via its own get_response override and converts it to client_kwargs["middleware"], but every other provider (Anthropic, Bedrock, Ollama, Azure AI Agent, Foundry Local, AG-UI) has no such override. Callers who pass middleware=[...] directly to those clients will have their middleware silently ignored — it flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both of which only inspect client_kwargs["middleware"]), and is eventually stripped by the raw client. This also introduces a leaky abstraction: client_kwargs is documented as provider-specific HTTP-client options, yet it is now the inter-layer messaging channel for middleware threading, with both FunctionInvocationLayer and ChatMiddlewareLayer silently popping a magic "middleware" key from it.

Flagged Issues

  • categorize_middleware: changing isinstance(source, list) to isinstance(source, Sequence) also matches str and bytes, which would silently decompose a string into individual characters. Add and not isinstance(source, (str, bytes)) to guard against accidental misuse.
  • middleware= is removed as a first-class named parameter from FunctionInvocationLayer.get_response. Only OpenAIChatClient re-adds it via its own override; AnthropicClient, BedrockChatClient, OllamaChatClient, AzureAIAgentClient, FoundryLocalClient, and AGUIChatClient do not. Calling get_response(messages, middleware=[...]) on those clients silently does nothing—the kwarg flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both only read from client_kwargs["middleware"]) and is stripped by the raw provider call. Fix by adding middleware as an explicit named parameter on FunctionInvocationLayer.get_response so the conversion to client_kwargs["middleware"] happens uniformly for all providers.

Suggestions

  • The single-slot pipeline cache stores only the most recent pipeline per client instance. Callers that alternate between different per-call middleware sets will thrash the cache with no benefit. Consider a small bounded LRU (e.g. up to 4 entries keyed by tuple identity) or at minimum document the single-slot limitation.
  • The matches() methods on pipeline classes compare tuple(middleware) using ==, relying on middleware object identity/equality. This is correct but worth documenting so users aren't surprised that semantically-equivalent but distinct middleware objects don't hit the cache.
  • Add MRO ordering tests for other reordered clients (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, AzureOpenAIResponsesClient, AzureAIClient, AzureAIAgentClient, BedrockChatClient, OllamaChatClient, FoundryLocalClient, OpenAIAssistantsClient). A single parameterized test asserting FunctionInvocationLayer < ChatMiddlewareLayer < ChatTelemetryLayer ordering across all public clients would be a lightweight guard against MRO regressions.
  • Add a test verifying RawAnthropicClient does NOT inherit FunctionInvocationLayer, ChatMiddlewareLayer, or ChatTelemetryLayer, matching the pattern already established by RawOpenAIChatClient.
  • The chat middleware pipeline cache test only exercises the path where chat_client_base.chat_middleware is empty. Add a case with non-empty base chat middleware (analogous to the function middleware cache test) to verify the cache key correctly includes base middleware.
  • client_kwargs is documented as provider-specific HTTP-client options but is now also used as an inter-layer messaging channel for middleware routing. This conflates two concerns and means a provider option legitimately named "middleware" would silently interfere. Consider a dedicated internal dict (e.g. _layer_kwargs) for cross-layer communication.

Automated review by moonbox3's agents

Copilot and others added 3 commits March 18, 2026 08:37
…InvocationLayer, and add tests (microsoft#4710)

- Guard categorize_middleware Sequence check against str/bytes to prevent
  character-by-character decomposition of accidentally passed strings
- Add explicit middleware parameter to FunctionInvocationLayer.get_response
  and merge it into client_kwargs before categorization, fixing the
  inconsistency where only OpenAIChatClient supported this parameter
- Add assertions that RawAnthropicClient does not inherit convenience layers
- Add chat middleware cache test with non-empty base middleware
- Add tests for single unwrapped middleware item and string input

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation lab Agent Framework Lab python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Feature]: Expose prepped_messages to FunctionMiddleware for mid-tool-loop message injection

4 participants