Feature/knowledge agent by amrit110 · Pull Request #18 · VectorInstitute/eval-agents

amrit110 · 2026-01-27T17:53:37Z

This pull request introduces a significant refactor and expansion of the aieng.agent_evals package, focusing on establishing a robust, modular, and well-documented foundation for agent evaluation and knowledge-grounded QA workflows. It adds new configuration, display, and grounding utilities, provides a comprehensive API for the knowledge agent, and removes outdated example implementations. It also updates environment variable management and pre-commit hooks for improved developer experience.

Key changes:

Agent Evaluation and Knowledge Agent Core

Introduced a new aieng.agent_evals.display module providing rich, reusable display utilities for evaluation outputs, including functions for displaying responses, comparisons, metrics, and messages in Jupyter notebooks using the rich library.
Added a new aieng.agent_evals.knowledge_agent package, exposing the KnowledgeGroundedAgent, configuration, evaluation classes, session management, and tracing utilities via a clear API in __init__.py.
Implemented a centralized configuration system for the knowledge agent using a Pydantic KnowledgeAgentConfig class, supporting environment variables and .env file loading.
Added a grounding_tool.py module defining the GroundedResponse and GroundingChunk models, a factory for a Google Search tool for agent grounding, and a utility to format responses with inline citations.

Package Structure and Cleanup

Removed outdated example implementation modules from aieng.agent_evals.impl, including example_impl.py and its __init__.py. [1] [2]
Created a new aieng.agent_evals.__init__.py that exposes display utilities and provides a clear package-level docstring.

Developer Experience

Updated .env.example to provide clear documentation and defaults for environment variables required for Gemini/OpenAI-compatible LLMs and LangFuse tracing.
Refined .pre-commit-config.yaml to use the correct ruff-check hook and expanded ignored error codes for nbqa-ruff to reduce unnecessary linting noise. [1] [2]

# Conflicts: # uv.lock

for more information, see https://pre-commit.ci

…eval-agents into feature/knowledge-agent

aieng-eval-agents/aieng/agent_evals/knowledge_agent/session.py

fcogidi · 2026-01-29T18:24:29Z

aieng-eval-agents/aieng/agent_evals/knowledge_agent/grounding_tool.py

+    tool_calls: list[dict] = Field(default_factory=list)
+
+
+def create_google_search_tool() -> GoogleSearchTool:


Why use a wrapper instead of providing the tool directly?

Because i'm setting bypass_multi_tools_limit=True, just wanted to be intentful, see docstring.

aieng-eval-agents/aieng/agent_evals/knowledge_agent/config.py

fcogidi · 2026-01-29T18:28:55Z

aieng-eval-agents/aieng/agent_evals/knowledge_agent/grounding_tool.py

Would it be better to have a tools package under aieng.agent_evals to hold other tools from other agents or to have it under aieng.agent_evals.<agent>.tools?

Hmm, not really sure at this point. We don't want code duplication, but i don't know to what extent we can use the same tools across agents as well. But for tracing and evals, its cleaner and more consistent if we had a single tools package. @lotif what do you think?

I saw your draft PR a bit @fcogidi, i think the shared tools package isn't a bad idea. I can align this PR towards that as well.

I could see what you guys are using and change my implementation. At some point I'll try to switch to Google ADK as well. Even though I think the langfuse evals I'm running are really handy and easy I still need to see in more details what you guys are doing.

aieng-eval-agents/aieng/agent_evals/knowledge_agent/evaluation.py

aieng-eval-agents/aieng/agent_evals/knowledge_agent/agent.py

aieng-eval-agents/aieng/agent_evals/knowledge_agent/evaluation.py

lotif · 2026-01-29T18:44:39Z

aieng-eval-agents/aieng/agent_evals/knowledge_agent/tracing.py

+    >>> if init_tracing():
+    ...     print("Tracing enabled!")
+    """
+    global _instrumented, _langfuse_client  # noqa: PLW0603


I have added a langfuse client into the AsyncClientManager class in my PR, but it's debatable if it really belongs there. We should stick to one solution, and I vote not to use global variables. AsycClientManager is a singleton which is a slightly cleaner solution.

Makes sense. I've extended your AsyncClientManager to work for me as well, and so its now compatible with your PR. Also removed use of global variables and align with singleton pattern.

aieng-eval-agents/aieng/agent_evals/knowledge_agent/tracing.py

.pre-commit-config.yaml

pyproject.toml

…otebooks

…nager for tracing

lotif

Thanks for addressing the comments :)

commit 4507d52 Merge: b4e124d 412298a Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 30 16:58:09 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit 412298a Author: Amrit Krishnan <amrit110@gmail.com> Date: Fri Jan 30 13:29:20 2026 -0500 Feature/knowledge agent (#18) * Add initial working implementation using search grounding * [pre-commit.ci] Add auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove example implementation * Fix GHSA-wp53-j4wj-2cfg, pin python-multipart version * Update agent to ReAct, fix grounding tool * Update README.md * Add tracing to langfuse * Clear notebook cells * Remove python-multipart as direct dependency and only update it * Remove D103 and E402 from being ignored in pre-commit check and fix notebooks * Move imports to top of the file * Simplify tracing module to just read directly from env variables * Rename async client manager for agent, reuse existing async client manager for tracing * Clarify optional dataset variable in docstring * Fix format_response_with_citations * Return results instead of modifying input params * Use pydantic native desc docstring instead of numpy style * Unify config to use same across agents * Use ADK's session management, remove custom implementation * Remove weaviate from client manager --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit b4e124d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 29 17:00:58 2026 -0500 Small fixes, additional logging and updated groud truth commit 7d59004 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:30:23 2026 -0500 Upgrading python-multipart + small improvements commit 2906b36 Merge: 285591b bba7326 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:12:03 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit 285591b Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:09:16 2026 -0500 Adding readme instructions commit 37348c0 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 15:53:47 2026 -0500 Minor improvements commit 9fdc71d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 15:46:25 2026 -0500 Addingh evaluator and retry mechanism commit 5af7152 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 14:41:36 2026 -0500 Using langfuse to upload a dataset and run the evaluation commit c1980fe Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 12:39:33 2026 -0500 Adding the eval dataset and making changes to the eval script. Adding tenacity for retrying mechanism commit 02c3ac5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 16:57:37 2026 -0500 Added code comments commit da9b0c9 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 16:50:06 2026 -0500 Finished using LLMs to evaluate result commit f0af403 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 13:51:06 2026 -0500 Moving forward with the evaluation script + some more refactorings commit 93ee157 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 11:36:52 2026 -0500 Reporting to langfuse and removed clutter commit d029285 Merge: a39ac1d 9549395 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 11:00:28 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit a39ac1d Merge: cdf0647 efd80cb Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 17:09:09 2026 -0500 Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration commit efd80cb Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 17:03:37 2026 -0500 CR by Franklin commit 7a2a57f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:31:49 2026 -0500 CR by Franklin commit cdf0647 Merge: 53d0589 534f8e5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:25:19 2026 -0500 Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration commit 534f8e5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:19:30 2026 -0500 CR by Franklin commit 53d0589 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:07:33 2026 -0500 Some more langfuse things commit 40dfc6f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 13:42:41 2026 -0500 Parsing client responses into langfuse traces commit 20e4ec5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:42:38 2026 -0500 Small refactor commit ee8b854 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:36:14 2026 -0500 Moving env and logging config to the top of the file commit 66a4494 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:13:05 2026 -0500 CR by Amrit commit f9d7862 Merge: dc02ff2 9042ace Author: Marcelo Lotif <lotif@users.noreply.github.com> Date: Mon Jan 26 11:12:42 2026 -0500 Merge branch 'main' into marcelo/report-agent commit dc02ff2 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:56:56 2026 -0500 Grammar fixes commit 530360e Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:42:50 2026 -0500 Adding a couple more vulnerabilities to the skip list commit 7bb081f Merge: 6e3c4c2 bd34ef0 Author: Marcelo Lotif <lotif@users.noreply.github.com> Date: Fri Jan 23 12:37:19 2026 -0500 Merge branch 'main' into marcelo/report-agent commit 6e3c4c2 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:35:08 2026 -0500 One more readme paragraph commit 37b4000 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:27:23 2026 -0500 Movign files around, adding the ddl file and the import script commit 3458565 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 14:39:47 2026 -0500 Generating xlsx reports commit 22fc569 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 12:28:40 2026 -0500 Adding more report examples commit 6592a1c Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 11:55:41 2026 -0500 Deleting weaviate stuff, using Online Retail dataset instead commit 0098f7d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 11:37:51 2026 -0500 Weaviate local and remote scripts commit 9e6ce2e Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 21 11:47:00 2026 -0500 Adding data import for the online retail dataset and some more instructions commit a77a60f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 16 17:56:15 2026 -0500 WIp trying to make it work

amrit110 and others added 5 commits January 26, 2026 15:58

Add initial working implementation using search grounding

93493d0

Merge remote-tracking branch 'origin/main' into feature/knowledge-agent

ab04275

# Conflicts: # uv.lock

[pre-commit.ci] Add auto fixes from pre-commit.com hooks

434eff6

for more information, see https://pre-commit.ci

Remove example implementation

98b1f17

Merge branch 'feature/knowledge-agent' of github.com:VectorInstitute/…

3333793

…eval-agents into feature/knowledge-agent

amrit110 self-assigned this Jan 27, 2026

amrit110 added the enhancement New feature or request label Jan 27, 2026

amrit110 added 6 commits January 27, 2026 12:53

Merge branch 'main' into feature/knowledge-agent

de98036

Fix GHSA-wp53-j4wj-2cfg, pin python-multipart version

e70df38

Update agent to ReAct, fix grounding tool

5335cef

Update README.md

ea57d2a

Add tracing to langfuse

b59e247

Clear notebook cells

b03f8b8

amrit110 requested review from fcogidi and lotif January 29, 2026 03:52

amrit110 marked this pull request as ready for review January 29, 2026 03:53

fcogidi reviewed Jan 29, 2026

View reviewed changes

aieng-eval-agents/aieng/agent_evals/knowledge_agent/agent.py Outdated Show resolved Hide resolved

lotif reviewed Jan 29, 2026

View reviewed changes

amrit110 added 11 commits January 29, 2026 14:49

Remove python-multipart as direct dependency and only update it

2dbc2c4

Remove D103 and E402 from being ignored in pre-commit check and fix n…

52dc6fb

…otebooks

Move imports to top of the file

3cf0baf

Simplify tracing module to just read directly from env variables

bd5b901

Rename async client manager for agent, reuse existing async client ma…

f583989

…nager for tracing

Clarify optional dataset variable in docstring

01ca359

Fix format_response_with_citations

fff1ab0

Return results instead of modifying input params

795e31b

Use pydantic native desc docstring instead of numpy style

b8713dd

Unify config to use same across agents

8323688

Use ADK's session management, remove custom implementation

972ab58

Remove weaviate from client manager

1410670

amrit110 requested review from fcogidi and lotif January 29, 2026 22:04

lotif approved these changes Jan 30, 2026

View reviewed changes

amrit110 merged commit 412298a into main Jan 30, 2026
3 checks passed

amrit110 deleted the feature/knowledge-agent branch January 30, 2026 18:29

		tool_calls: list[dict] = Field(default_factory=list)


		def create_google_search_tool() -> GoogleSearchTool:

Conversation

amrit110 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent Evaluation and Knowledge Agent Core

Package Structure and Cleanup

Developer Experience

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lotif left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amrit110 commented Jan 27, 2026 •

edited

Loading