Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746
Python: [BREAKING] Refactor middleware layering and split Anthropic raw client#4746eavanvalkenburg wants to merge 6 commits intomicrosoft:mainfrom
Conversation
Reorder chat client layers so function invocation wraps chat middleware, and chat middleware stays outside telemetry while still running for each inner model call. Add middleware pipeline caching, refresh docs and samples, and split Anthropic into raw and public clients to match the standard layering model. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add targeted typing ignores in workflow visualization and lab modules so pyright stays clean alongside the middleware refactor work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors the Python chat-client layer stack so chat middleware executes around each inner model call in the tool loop (without inflating telemetry spans), standardizes per-call middleware routing via client_kwargs["middleware"], and aligns Anthropic with the Raw...Client/public client split used by other providers.
Changes:
- Reorders the standard public client MRO to
FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base clientacross providers, tests, and docs. - Moves chat/function/agent middleware pipeline construction + reuse into the relevant layers (pipeline caching).
- Splits Anthropic into
RawAnthropicClient+AnthropicClient, and updates docs/samples (including a per-call usage tracking middleware sample).
Reviewed changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/providers/custom/README.md | Updates recommended layer ordering and composition example. |
| python/samples/02-agents/observability/advanced_zero_code.py | Clarifies telemetry vs middleware placement and span behavior. |
| python/samples/02-agents/observability/advanced_manual_setup_console_output.py | Updates observability guidance to the new default layer order. |
| python/samples/02-agents/middleware/usage_tracking_middleware.py | Adds a sample showing per-call usage tracking for streaming and non-streaming tool loops. |
| python/samples/02-agents/middleware/agent_and_run_level_middleware.py | Updates middleware ordering explanation to reflect per-call chat middleware behavior. |
| python/samples/02-agents/middleware/README.md | Adds index + instructions for running the new usage tracking sample. |
| python/samples/02-agents/chat_client/custom_chat_client.py | Updates sample to recommend and demonstrate the new layer order. |
| python/samples/02-agents/auto_retry.py | Updates retry middleware docs to reflect “per model call” semantics in tool loops. |
| python/packages/orchestrations/tests/test_handoff.py | Updates mock client MRO ordering to match the new layering. |
| python/packages/ollama/agent_framework_ollama/_chat_client.py | Reorders Ollama client inheritance to match the standard layer stack. |
| python/packages/lab/lightning/agent_framework_lab_lightning/init.py | Adjusts typing ignores around optional agentlightning dependency. |
| python/packages/lab/gaia/agent_framework_lab_gaia/gaia.py | Adds typing ignore for optional pyarrow import. |
| python/packages/foundry_local/agent_framework_foundry_local/_foundry_local_client.py | Reorders FoundryLocal client inheritance to match the standard layer stack. |
| python/packages/core/tests/core/test_observability.py | Updates span-ordering test expectations + mock client MRO. |
| python/packages/core/tests/core/test_middleware_with_chat.py | Updates per-call middleware passing to client_kwargs["middleware"] and adds pipeline cache tests. |
| python/packages/core/tests/core/test_middleware_with_agent.py | Adds agent middleware pipeline cache tests and tool-loop ordering coverage. |
| python/packages/core/tests/core/test_kwargs_propagation_to_ai_function.py | Reorders mock client inheritance to match the new standard stack. |
| python/packages/core/tests/core/test_function_invocation_logic.py | Switches per-call middleware usage to client_kwargs["middleware"] and updates mock init args. |
| python/packages/core/tests/core/test_clients.py | Updates docstring/signature expectations (removal of function_middleware references). |
| python/packages/core/tests/core/conftest.py | Updates mock client initialization to use middleware=[] and new MRO order. |
| python/packages/core/agent_framework/openai/_responses_client.py | Updates OpenAI Responses client MRO + raw-client layering docstring. |
| python/packages/core/agent_framework/openai/_chat_client.py | Updates OpenAI Chat client MRO + removes function_middleware param and routes middleware via client_kwargs. |
| python/packages/core/agent_framework/openai/_assistants_client.py | Updates assistants client MRO ordering to the new standard stack. |
| python/packages/core/agent_framework/observability.py | Adds typing ignores around optional OpenTelemetry exporter imports. |
| python/packages/core/agent_framework/azure/_responses_client.py | Updates Azure responses client MRO ordering. |
| python/packages/core/agent_framework/azure/_chat_client.py | Updates Azure chat client MRO ordering. |
| python/packages/core/agent_framework/_workflows/_viz.py | Adds typing ignores around optional graphviz usage. |
| python/packages/core/agent_framework/_tools.py | Refactors FunctionInvocationLayer to route middleware via client_kwargs["middleware"] and cache function middleware pipelines. |
| python/packages/core/agent_framework/_middleware.py | Adds pipeline .matches() and caching for chat/function/agent middleware pipelines; adjusts ChatMiddlewareLayer init/signature. |
| python/packages/core/agent_framework/_clients.py | Updates layered docstring composition to reflect new layer responsibilities and kwarg docs. |
| python/packages/bedrock/agent_framework_bedrock/_chat_client.py | Reorders Bedrock client inheritance to match the standard layer stack. |
| python/packages/azure-ai/tests/test_azure_ai_agent_client.py | Updates test fixtures for new middleware fields/caches. |
| python/packages/azure-ai/agent_framework_azure_ai/_client.py | Updates raw-client layering docstring + public client MRO. |
| python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py | Updates Azure AI agent client MRO ordering. |
| python/packages/anthropic/tests/test_anthropic_client.py | Updates tests for RawAnthropicClient split and standard layer stack. |
| python/packages/anthropic/agent_framework_anthropic/_chat_client.py | Introduces RawAnthropicClient and composes AnthropicClient with standard layering. |
| python/packages/anthropic/agent_framework_anthropic/init.py | Exports RawAnthropicClient. |
| python/packages/ag-ui/tests/ag_ui/conftest.py | Updates streaming stub MRO and init arg (middleware=[]). |
| python/packages/ag-ui/agent_framework_ag_ui/_client.py | Updates AG-UI chat client MRO ordering. |
You can also share your feedback on Copilot code review. Take the survey.
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 88%
✓ Correctness
This is a well-structured refactoring that reorders the MRO so FunctionInvocationLayer sits above ChatMiddlewareLayer, moves middleware categorization responsibility into FunctionInvocationLayer, adds single-slot pipeline caching, splits the Anthropic client into Raw and full-featured classes, and removes the public
function_middlewareparameter in favor of unifiedmiddleware. The code paths are internally consistent: FunctionInvocationLayer.init and get_response correctly categorize mixed middleware, keep function middleware for itself, and forward chat middleware via kwargs; ChatMiddlewareLayer correctly receives only chat middleware; the pipeline caching correctly uses identity-based tuple comparison viamatches(). Tests thoroughly cover the new layer ordering, caching, and runtime middleware splitting. No correctness bugs found.
✓ Security Reliability
This is a large structural refactoring that reorders the MRO for all chat client classes, splits middleware routing so FunctionInvocationLayer owns the combined middleware parameter, adds single-slot pipeline caching, and introduces RawAnthropicClient. From a security/reliability perspective, the changes are sound: no injection risks, no secrets exposure, no unsafe deserialization. The caching logic is safe for single-threaded async (no await between check and set). The one robustness concern is that
categorize_middlewareonly checksisinstance(source, list)when unpacking sequences, but the newFunctionInvocationLayer.__init__andget_responsenow pass whole sequences (not unpacked) — if a caller passes atuple(validSequence) instead of alist, the middleware items would not be categorized correctly and would silently fall through to the wrong bucket or fail classification.
✓ Design Approach
The diff correctly inverts the layer order from
ChatMiddlewareLayer → FunctionInvocationLayer → ChatTelemetryLayertoFunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayerso that chat middleware runs once per model call (inside the tool loop) rather than once per entire tool-call chain. The change is well-motivated and consistently applied across all clients. TheRawAnthropicClientsplit follows the existingRawOpenAI*Clientpattern. The per-pipeline caching withmatches()is a reasonable optimization. One structural concern in the new test is that MRO index assertions are fragile: they encode exact positional information about the class hierarchy rather than the behavioral ordering property being tested. A second, minor concern is that the single-slotmatches()cache uses==(value equality) — if any middleware object ever defines a custom__eq__, two structurally different pipelines could incorrectly share a cached instance. Neither issue is blocking, but the MRO test pattern is worth addressing proactively given that similar clients (e.g.,AzureOpenAIChatClient) already have extra mixins that would shift those indices.
Suggestions
- New call sites in
FunctionInvocationLayer.__init__andget_responsepass the middleware sequence directly tocategorize_middleware(middleware)instead of unpacking it (*(middleware or [])). Sincecategorize_middlewareonly checksisinstance(source, list), atuple(a validSequence) would be appended as a single item and silently misclassified. Either unpack at call sites (categorize_middleware(*(middleware or []))) or broaden the check incategorize_middlewareto handle allSequencetypes (e.g.,isinstance(source, (list, tuple))orcollections.abc.Sequencewith astrexclusion). - The single-slot pipeline caches (
_cached_chat_middleware_pipeline, etc.) are adequate for the common case but will thrash if concurrent async tasks use different per-call middleware, and would race under multi-threaded usage. Since the worst case is pipeline recreation (not incorrectness), consider adding a brief code comment documenting the single-threaded async assumption for future readers. - In
test_anthropic_client_wraps_raw_client_with_standard_layer_order, prefer relative ordering checks viamro.index()over hardcoded MRO indices. Clients likeAzureOpenAIChatClientalready have extra mixins that would shift indices, so assertingmro.index(FunctionInvocationLayer) < mro.index(ChatMiddlewareLayer) < mro.index(ChatTelemetryLayer) < mro.index(RawAnthropicClient)expresses the intent without breaking when a new mixin is inserted. - The
matches()method usesself._source_middleware == tuple(middleware), which relies on the default identity-based__eq__of middleware objects. This is correct for all current middleware, but a brief comment explaining the assumption would prevent incorrect cache reuse if a future middleware class defines value-based__eq__.
Automated review by eavanvalkenburg's agents
…RO assertions - Broaden isinstance check in categorize_middleware from list to Sequence so tuples and other Sequence types are properly unpacked instead of being appended as a single item. - Replace fragile hardcoded MRO index assertions in anthropic test with relative ordering via mro.index(). - Add regression tests for categorize_middleware with tuple, list, and None inputs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 87%
✗ Security Reliability
This PR refactors the MRO ordering of middleware layers (swapping FunctionInvocationLayer above ChatMiddlewareLayer), extracts a RawAnthropicClient, adds pipeline caching for middleware, and moves middleware routing responsibilities between layers. The changes are largely structural and well-tested. One reliability concern: the
categorize_middlewarechange fromisinstance(source, list)toisinstance(source, Sequence)will also matchstrandbytes, which are sequences of characters/ints — if a string were ever passed, it would be silently decomposed character-by-character into the middleware list. While type annotations make this unlikely at runtime, a defensive guard is warranted. The pipeline caching is single-slot (one cached value per client instance) which is correct for the common case but worth noting. No security issues (no secrets, no injection risks, no unsafe deserialization) were found.
✓ Test Coverage
This is a large refactoring that reorders the MRO so FunctionInvocationLayer comes before ChatMiddlewareLayer, extracts RawAnthropicClient, adds middleware pipeline caching, and moves function-middleware routing into FunctionInvocationLayer. The core behavioral changes are well-tested: new integration tests verify the tool-loop middleware ordering, pipeline caching is covered for all three pipeline types, and categorize_middleware's Sequence support is tested. However, the MRO reordering was applied to ~12 client classes but only the Anthropic client has an explicit MRO ordering test; the other reordered clients (OpenAI, Azure, Bedrock, Ollama, FoundryLocal, etc.) lack similar guards. RawAnthropicClient is newly exported but has no test exercising it independently.
✗ Design Approach
The layer reordering (FunctionInvocationLayer → ChatMiddlewareLayer → ChatTelemetryLayer) is correct and a genuine semantic improvement: chat middleware now executes per model-call within the tool loop rather than wrapping the entire loop. However, the PR has one blocking design issue: removing
middleware=as a first-class named parameter fromFunctionInvocationLayer.get_responsecreates a provider-inconsistent API.OpenAIChatClientre-addsmiddleware=via its ownget_responseoverride and converts it toclient_kwargs["middleware"], but every other provider (Anthropic, Bedrock, Ollama, Azure AI Agent, Foundry Local, AG-UI) has no such override. Callers who passmiddleware=[...]directly to those clients will have their middleware silently ignored — it flows through**kwargspastFunctionInvocationLayerandChatMiddlewareLayer(both of which only inspectclient_kwargs["middleware"]), and is eventually stripped by the raw client. This also introduces a leaky abstraction:client_kwargsis documented as provider-specific HTTP-client options, yet it is now the inter-layer messaging channel for middleware threading, with bothFunctionInvocationLayerandChatMiddlewareLayersilently popping a magic"middleware"key from it.
Flagged Issues
- categorize_middleware: changing
isinstance(source, list)toisinstance(source, Sequence)also matchesstrandbytes, which would silently decompose a string into individual characters. Addand not isinstance(source, (str, bytes))to guard against accidental misuse. - middleware= is removed as a first-class named parameter from FunctionInvocationLayer.get_response. Only OpenAIChatClient re-adds it via its own override; AnthropicClient, BedrockChatClient, OllamaChatClient, AzureAIAgentClient, FoundryLocalClient, and AGUIChatClient do not. Calling get_response(messages, middleware=[...]) on those clients silently does nothing—the kwarg flows through **kwargs past FunctionInvocationLayer and ChatMiddlewareLayer (both only read from client_kwargs["middleware"]) and is stripped by the raw provider call. Fix by adding middleware as an explicit named parameter on FunctionInvocationLayer.get_response so the conversion to client_kwargs["middleware"] happens uniformly for all providers.
Suggestions
- The single-slot pipeline cache stores only the most recent pipeline per client instance. Callers that alternate between different per-call middleware sets will thrash the cache with no benefit. Consider a small bounded LRU (e.g. up to 4 entries keyed by tuple identity) or at minimum document the single-slot limitation.
- The
matches()methods on pipeline classes comparetuple(middleware)using==, relying on middleware object identity/equality. This is correct but worth documenting so users aren't surprised that semantically-equivalent but distinct middleware objects don't hit the cache. - Add MRO ordering tests for other reordered clients (OpenAIChatClient, OpenAIResponsesClient, AzureOpenAIChatClient, AzureOpenAIResponsesClient, AzureAIClient, AzureAIAgentClient, BedrockChatClient, OllamaChatClient, FoundryLocalClient, OpenAIAssistantsClient). A single parameterized test asserting FunctionInvocationLayer < ChatMiddlewareLayer < ChatTelemetryLayer ordering across all public clients would be a lightweight guard against MRO regressions.
- Add a test verifying RawAnthropicClient does NOT inherit FunctionInvocationLayer, ChatMiddlewareLayer, or ChatTelemetryLayer, matching the pattern already established by RawOpenAIChatClient.
- The chat middleware pipeline cache test only exercises the path where
chat_client_base.chat_middlewareis empty. Add a case with non-empty base chat middleware (analogous to the function middleware cache test) to verify the cache key correctly includes base middleware. - client_kwargs is documented as provider-specific HTTP-client options but is now also used as an inter-layer messaging channel for middleware routing. This conflates two concerns and means a provider option legitimately named "middleware" would silently interfere. Consider a dedicated internal dict (e.g. _layer_kwargs) for cross-layer communication.
Automated review by moonbox3's agents
…InvocationLayer, and add tests (microsoft#4710) - Guard categorize_middleware Sequence check against str/bytes to prevent character-by-character decomposition of accidentally passed strings - Add explicit middleware parameter to FunctionInvocationLayer.get_response and merge it into client_kwargs before categorization, fixing the inconsistency where only OpenAIChatClient supported this parameter - Add assertions that RawAnthropicClient does not inherit convenience layers - Add chat middleware cache test with non-empty base middleware - Add tests for single unwrapped middleware item and string input Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Fixes #4710.
This change clarifies the separation of concerns between agent middleware, chat middleware, function invocation, and telemetry. Chat middleware now runs for each inner model call in the function loop while remaining outside telemetry so middleware latency does not skew per-call timings. It also covers the related per-call usage-tracking scenario discussed in #4671 and aligns Anthropic with the raw/public client pattern used elsewhere.
Description
FunctionInvocationLayer -> ChatMiddlewareLayer -> ChatTelemetryLayer -> Raw/Base clientChatMiddlewareLayerand add matching cache reuse for agent and function middleware pipelinesclient_kwargs["middleware"]and remove stalefunction_middlewaredocstring/runtime referencesRawAnthropicClientplus publicAnthropicClient, and include the small ancillary typing cleanups already present on the branchContribution Checklist