Skip to content

Python: Foundry Evals integration for Python#4750

Draft
alliscode wants to merge 3 commits intomicrosoft:mainfrom
alliscode:af-foundry-evals-python
Draft

Python: Foundry Evals integration for Python#4750
alliscode wants to merge 3 commits intomicrosoft:mainfrom
alliscode:af-foundry-evals-python

Conversation

@alliscode
Copy link
Member

Add evaluation framework with local and Foundry-hosted evaluator support:

  • EvalItem/EvalResult core types with conversation splitting strategies
  • @evaluator decorator for defining custom evaluation functions
  • LocalEvaluator for running evaluations locally
  • FoundryEvals provider for Azure AI Foundry hosted evaluations
  • evaluate_agent() orchestration with expected values support
  • evaluate_workflow() for multi-agent workflow evaluation
  • Comprehensive test suite and evaluation samples

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Mar 17, 2026
@github-actions github-actions bot changed the title Foundry Evals integration for Python Python: Foundry Evals integration for Python Mar 17, 2026
@alliscode alliscode force-pushed the af-foundry-evals-python branch from a0edd5f to fe9e621 Compare March 17, 2026 21:21
@alliscode alliscode force-pushed the af-foundry-evals-python branch 6 times, most recently from 15d8640 to aad92ac Compare March 19, 2026 20:41
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Mar 19, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/azure-ai/agent_framework_azure_ai
   _foundry_evals.py2335078%244, 266, 271–272, 289–293, 300, 303–307, 316–325, 586, 593, 605, 612, 724–725, 727–728, 735, 741–742, 744, 748–751, 753, 760, 767, 810–811, 813, 823, 832, 839
packages/core/agent_framework
   _agents.py3624787%465, 469, 524, 942, 978, 994, 1091–1095, 1150, 1178, 1311, 1327, 1329, 1342, 1348, 1384, 1386, 1395–1400, 1405, 1407, 1413–1414, 1421, 1423–1424, 1432–1433, 1436–1438, 1448–1453, 1457, 1462, 1464
   _evaluation.py6259984%217, 247, 262, 476, 478, 582–583, 662–664, 669, 706–709, 766–767, 770, 776–778, 782, 815–817, 869, 894–902, 907–908, 913–916, 921, 926, 932, 1028, 1140, 1456, 1458, 1466, 1476, 1480, 1506, 1508–1511, 1520, 1524–1526, 1531–1534, 1538–1539, 1559–1562, 1564, 1634, 1640, 1655, 1659–1661, 1691, 1697–1701, 1735, 1756–1759, 1761, 1763–1765, 1775, 1783–1784, 1786, 1811–1812, 1817
packages/core/agent_framework/_workflows
   _agent_executor.py2051791%109, 133, 174, 200–201, 256–257, 259–260, 296–298, 300, 413–414, 479, 498
   _workflow.py2711992%88, 269–271, 273–274, 292, 296, 435, 623, 644, 700, 712, 718, 723, 743–745, 758
TOTAL28160336688% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
5514 20 💤 0 ❌ 0 🔥 1m 26s ⏱️

@alliscode alliscode force-pushed the af-foundry-evals-python branch from aad92ac to af0ccf6 Compare March 20, 2026 20:44
Merged and refactored eval module per Eduard's PR review:

- Merge _eval.py + _local_eval.py into single _evaluation.py
- Convert EvalItem from dataclass to regular class
- Rename to_dict() to to_eval_data()
- Convert _AgentEvalData to TypedDict
- Simplify check system: unified async pattern with isawaitable
- Parallelize checks and evaluators with asyncio.gather
- Add all/any mode to tool_called_check
- Fix bool(passed) truthy bug in _coerce_result
- Remove deprecated function_evaluator/async_function_evaluator aliases
- Remove _MinimalAgent, tighten evaluate_agent signature
- Set self.name in __init__ (LocalEvaluator, FoundryEvals)
- Limit FoundryEvals to AsyncOpenAI only
- Type project_client as AIProjectClient
- Remove NotImplementedError continuous eval code
- Add evaluation samples in 02-agents/ and 03-workflows/
- Update all imports and tests (167 passing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@alliscode alliscode force-pushed the af-foundry-evals-python branch from af0ccf6 to 45527ee Compare March 20, 2026 21:24
alliscode and others added 2 commits March 20, 2026 15:25
Use cast(list[Any], x) with type: ignore[redundant-cast] comments to
satisfy both mypy (which considers casting Any redundant) and pyright
strict mode (which needs explicit casts to narrow Unknown types).

Also fix evaluator decorator check_name type annotation to be
explicitly str, resolving mypy str|Any|None mismatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…attr

- Apply pyupgrade: Sequence from collections.abc, remove forward-ref quotes
- Add @overload signatures to evaluator() for proper @evaluator usage
- Fix evaluate_workflow sample to use WorkflowBuilder(start_executor=) API
- Fix _workflow.py executor.reset() to use getattr pattern for pyright
- Remove unused EvalResults forward-ref string in default_factory lambda

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants