Python: Foundry Evals integration for Python by alliscode · Pull Request #4750 · microsoft/agent-framework

alliscode · 2026-03-17T21:15:01Z

Add evaluation framework with local and Foundry-hosted evaluator support:

EvalItem/EvalResult core types with conversation splitting strategies
@evaluator decorator for defining custom evaluation functions
LocalEvaluator for running evaluations locally
FoundryEvals provider for Azure AI Foundry hosted evaluations
evaluate_agent() orchestration with expected values support
evaluate_workflow() for multi-agent workflow evaluation
Comprehensive test suite and evaluation samples

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

python/packages/core/agent_framework/_evaluation.py

python/packages/core/agent_framework/__init__.py

python/packages/core/agent_framework/_eval.py

python/packages/azure-ai/agent_framework_azure_ai/_foundry_evals.py

markwallace-microsoft · 2026-03-19T20:44:50Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/azure-ai/agent_framework_azure_ai
_foundry_evals.py	233	50	78%	244, 266, 271–272, 289–293, 300, 303–307, 316–325, 586, 593, 605, 612, 724–725, 727–728, 735, 741–742, 744, 748–751, 753, 760, 767, 810–811, 813, 823, 832, 839
packages/core/agent_framework
_agents.py	362	47	87%	465, 469, 524, 942, 978, 994, 1091–1095, 1150, 1178, 1311, 1327, 1329, 1342, 1348, 1384, 1386, 1395–1400, 1405, 1407, 1413–1414, 1421, 1423–1424, 1432–1433, 1436–1438, 1448–1453, 1457, 1462, 1464
_evaluation.py	625	99	84%	217, 247, 262, 476, 478, 582–583, 662–664, 669, 706–709, 766–767, 770, 776–778, 782, 815–817, 869, 894–902, 907–908, 913–916, 921, 926, 932, 1028, 1140, 1456, 1458, 1466, 1476, 1480, 1506, 1508–1511, 1520, 1524–1526, 1531–1534, 1538–1539, 1559–1562, 1564, 1634, 1640, 1655, 1659–1661, 1691, 1697–1701, 1735, 1756–1759, 1761, 1763–1765, 1775, 1783–1784, 1786, 1811–1812, 1817
packages/core/agent_framework/_workflows
_agent_executor.py	205	17	91%	109, 133, 174, 200–201, 256–257, 259–260, 296–298, 300, 413–414, 479, 498
_workflow.py	271	19	92%	88, 269–271, 273–274, 292, 296, 435, 623, 644, 700, 712, 718, 723, 743–745, 758
TOTAL	28160	3366	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5514	20 💤	0 ❌	0 🔥	1m 26s ⏱️

Merged and refactored eval module per Eduard's PR review: - Merge _eval.py + _local_eval.py into single _evaluation.py - Convert EvalItem from dataclass to regular class - Rename to_dict() to to_eval_data() - Convert _AgentEvalData to TypedDict - Simplify check system: unified async pattern with isawaitable - Parallelize checks and evaluators with asyncio.gather - Add all/any mode to tool_called_check - Fix bool(passed) truthy bug in _coerce_result - Remove deprecated function_evaluator/async_function_evaluator aliases - Remove _MinimalAgent, tighten evaluate_agent signature - Set self.name in __init__ (LocalEvaluator, FoundryEvals) - Limit FoundryEvals to AsyncOpenAI only - Type project_client as AIProjectClient - Remove NotImplementedError continuous eval code - Add evaluation samples in 02-agents/ and 03-workflows/ - Update all imports and tests (167 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use cast(list[Any], x) with type: ignore[redundant-cast] comments to satisfy both mypy (which considers casting Any redundant) and pyright strict mode (which needs explicit casts to narrow Unknown types). Also fix evaluator decorator check_name type annotation to be explicitly str, resolving mypy str|Any|None mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@overload

…attr - Apply pyupgrade: Sequence from collections.abc, remove forward-ref quotes - Add @overload signatures to evaluator() for proper @evaluator usage - Fix evaluate_workflow sample to use WorkflowBuilder(start_executor=) API - Fix _workflow.py executor.reset() to use getattr pattern for pyright - Remove unused EvalResults forward-ref string in default_factory lambda Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

markwallace-microsoft added documentation Improvements or additions to documentation python labels Mar 17, 2026

github-actions bot changed the title ~~Foundry Evals integration for Python~~ Python: Foundry Evals integration for Python Mar 17, 2026

alliscode force-pushed the af-foundry-evals-python branch from a0edd5f to fe9e621 Compare March 17, 2026 21:21

eavanvalkenburg reviewed Mar 18, 2026

View reviewed changes

alliscode force-pushed the af-foundry-evals-python branch 6 times, most recently from 15d8640 to aad92ac Compare March 19, 2026 20:41

alliscode force-pushed the af-foundry-evals-python branch from aad92ac to af0ccf6 Compare March 20, 2026 20:44

alliscode force-pushed the af-foundry-evals-python branch from af0ccf6 to 45527ee Compare March 20, 2026 21:24

alliscode and others added 2 commits March 20, 2026 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Foundry Evals integration for Python#4750

Python: Foundry Evals integration for Python#4750
alliscode wants to merge 3 commits intomicrosoft:mainfrom
alliscode:af-foundry-evals-python

alliscode commented Mar 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alliscode commented Mar 17, 2026

Contribution Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markwallace-microsoft commented Mar 19, 2026 •

edited

Loading