Conversation
# Conflicts: # uv.lock
for more information, see https://pre-commit.ci
…eval-agents into feature/knowledge-agent
| tool_calls: list[dict] = Field(default_factory=list) | ||
|
|
||
|
|
||
| def create_google_search_tool() -> GoogleSearchTool: |
There was a problem hiding this comment.
Why use a wrapper instead of providing the tool directly?
There was a problem hiding this comment.
Because i'm setting bypass_multi_tools_limit=True, just wanted to be intentful, see docstring.
There was a problem hiding this comment.
Would it be better to have a tools package under aieng.agent_evals to hold other tools from other agents or to have it under aieng.agent_evals.<agent>.tools?
There was a problem hiding this comment.
Hmm, not really sure at this point. We don't want code duplication, but i don't know to what extent we can use the same tools across agents as well. But for tracing and evals, its cleaner and more consistent if we had a single tools package. @lotif what do you think?
There was a problem hiding this comment.
I saw your draft PR a bit @fcogidi, i think the shared tools package isn't a bad idea. I can align this PR towards that as well.
There was a problem hiding this comment.
I could see what you guys are using and change my implementation. At some point I'll try to switch to Google ADK as well. Even though I think the langfuse evals I'm running are really handy and easy I still need to see in more details what you guys are doing.
aieng-eval-agents/aieng/agent_evals/knowledge_agent/evaluation.py
Outdated
Show resolved
Hide resolved
aieng-eval-agents/aieng/agent_evals/knowledge_agent/evaluation.py
Outdated
Show resolved
Hide resolved
| >>> if init_tracing(): | ||
| ... print("Tracing enabled!") | ||
| """ | ||
| global _instrumented, _langfuse_client # noqa: PLW0603 |
There was a problem hiding this comment.
I have added a langfuse client into the AsyncClientManager class in my PR, but it's debatable if it really belongs there. We should stick to one solution, and I vote not to use global variables. AsycClientManager is a singleton which is a slightly cleaner solution.
There was a problem hiding this comment.
Makes sense. I've extended your AsyncClientManager to work for me as well, and so its now compatible with your PR. Also removed use of global variables and align with singleton pattern.
…nager for tracing
lotif
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments :)
commit 4507d52 Merge: b4e124d 412298a Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 30 16:58:09 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit 412298a Author: Amrit Krishnan <amrit110@gmail.com> Date: Fri Jan 30 13:29:20 2026 -0500 Feature/knowledge agent (#18) * Add initial working implementation using search grounding * [pre-commit.ci] Add auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove example implementation * Fix GHSA-wp53-j4wj-2cfg, pin python-multipart version * Update agent to ReAct, fix grounding tool * Update README.md * Add tracing to langfuse * Clear notebook cells * Remove python-multipart as direct dependency and only update it * Remove D103 and E402 from being ignored in pre-commit check and fix notebooks * Move imports to top of the file * Simplify tracing module to just read directly from env variables * Rename async client manager for agent, reuse existing async client manager for tracing * Clarify optional dataset variable in docstring * Fix format_response_with_citations * Return results instead of modifying input params * Use pydantic native desc docstring instead of numpy style * Unify config to use same across agents * Use ADK's session management, remove custom implementation * Remove weaviate from client manager --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit b4e124d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 29 17:00:58 2026 -0500 Small fixes, additional logging and updated groud truth commit 7d59004 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:30:23 2026 -0500 Upgrading python-multipart + small improvements commit 2906b36 Merge: 285591b bba7326 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:12:03 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit 285591b Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 16:09:16 2026 -0500 Adding readme instructions commit 37348c0 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 15:53:47 2026 -0500 Minor improvements commit 9fdc71d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 15:46:25 2026 -0500 Addingh evaluator and retry mechanism commit 5af7152 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 14:41:36 2026 -0500 Using langfuse to upload a dataset and run the evaluation commit c1980fe Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 28 12:39:33 2026 -0500 Adding the eval dataset and making changes to the eval script. Adding tenacity for retrying mechanism commit 02c3ac5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 16:57:37 2026 -0500 Added code comments commit da9b0c9 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 16:50:06 2026 -0500 Finished using LLMs to evaluate result commit f0af403 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 13:51:06 2026 -0500 Moving forward with the evaluation script + some more refactorings commit 93ee157 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 11:36:52 2026 -0500 Reporting to langfuse and removed clutter commit d029285 Merge: a39ac1d 9549395 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Tue Jan 27 11:00:28 2026 -0500 Merge branch 'main' into marcelo/langfuse-integration commit a39ac1d Merge: cdf0647 efd80cb Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 17:09:09 2026 -0500 Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration commit efd80cb Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 17:03:37 2026 -0500 CR by Franklin commit 7a2a57f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:31:49 2026 -0500 CR by Franklin commit cdf0647 Merge: 53d0589 534f8e5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:25:19 2026 -0500 Merge branch 'marcelo/report-agent' into marcelo/langfuse-integration commit 534f8e5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:19:30 2026 -0500 CR by Franklin commit 53d0589 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 16:07:33 2026 -0500 Some more langfuse things commit 40dfc6f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 13:42:41 2026 -0500 Parsing client responses into langfuse traces commit 20e4ec5 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:42:38 2026 -0500 Small refactor commit ee8b854 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:36:14 2026 -0500 Moving env and logging config to the top of the file commit 66a4494 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Mon Jan 26 11:13:05 2026 -0500 CR by Amrit commit f9d7862 Merge: dc02ff2 9042ace Author: Marcelo Lotif <lotif@users.noreply.github.com> Date: Mon Jan 26 11:12:42 2026 -0500 Merge branch 'main' into marcelo/report-agent commit dc02ff2 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:56:56 2026 -0500 Grammar fixes commit 530360e Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:42:50 2026 -0500 Adding a couple more vulnerabilities to the skip list commit 7bb081f Merge: 6e3c4c2 bd34ef0 Author: Marcelo Lotif <lotif@users.noreply.github.com> Date: Fri Jan 23 12:37:19 2026 -0500 Merge branch 'main' into marcelo/report-agent commit 6e3c4c2 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:35:08 2026 -0500 One more readme paragraph commit 37b4000 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 23 12:27:23 2026 -0500 Movign files around, adding the ddl file and the import script commit 3458565 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 14:39:47 2026 -0500 Generating xlsx reports commit 22fc569 Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 12:28:40 2026 -0500 Adding more report examples commit 6592a1c Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 11:55:41 2026 -0500 Deleting weaviate stuff, using Online Retail dataset instead commit 0098f7d Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Thu Jan 22 11:37:51 2026 -0500 Weaviate local and remote scripts commit 9e6ce2e Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Wed Jan 21 11:47:00 2026 -0500 Adding data import for the online retail dataset and some more instructions commit a77a60f Author: Marcelo Lotif <marcelo.lotif@vectorinstitute.ai> Date: Fri Jan 16 17:56:15 2026 -0500 WIp trying to make it work
This pull request introduces a significant refactor and expansion of the
aieng.agent_evalspackage, focusing on establishing a robust, modular, and well-documented foundation for agent evaluation and knowledge-grounded QA workflows. It adds new configuration, display, and grounding utilities, provides a comprehensive API for the knowledge agent, and removes outdated example implementations. It also updates environment variable management and pre-commit hooks for improved developer experience.Key changes:
Agent Evaluation and Knowledge Agent Core
aieng.agent_evals.displaymodule providing rich, reusable display utilities for evaluation outputs, including functions for displaying responses, comparisons, metrics, and messages in Jupyter notebooks using therichlibrary.aieng.agent_evals.knowledge_agentpackage, exposing theKnowledgeGroundedAgent, configuration, evaluation classes, session management, and tracing utilities via a clear API in__init__.py.KnowledgeAgentConfigclass, supporting environment variables and.envfile loading.grounding_tool.pymodule defining theGroundedResponseandGroundingChunkmodels, a factory for a Google Search tool for agent grounding, and a utility to format responses with inline citations.Package Structure and Cleanup
aieng.agent_evals.impl, includingexample_impl.pyand its__init__.py. [1] [2]aieng.agent_evals.__init__.pythat exposes display utilities and provides a clear package-level docstring.Developer Experience
.env.exampleto provide clear documentation and defaults for environment variables required for Gemini/OpenAI-compatible LLMs and LangFuse tracing..pre-commit-config.yamlto use the correctruff-checkhook and expanded ignored error codes fornbqa-ruffto reduce unnecessary linting noise. [1] [2]