feat: naive token estimation via tiktoken#2031
feat: naive token estimation via tiktoken#2031lizradway wants to merge 3 commits intostrands-agents:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
6c2c1c5 to
55bd572
Compare
Review Summary (Round 2)Assessment: Approve ✅ All previously identified blocking issues have been addressed in this update. Changes Verified
Additional Improvements
Note: Codecov shows 92.85% patch coverage with 4 partial lines - these are likely error handling branches ( Good implementation with clean code structure, appropriate error handling, and thorough test coverage. 👍 |
68e8484 to
a536a86
Compare
|
/strands review |
Review Summary (Round 3)Assessment: Approve ✅ Significant improvements since Round 2 — tiktoken is now an optional dependency with lazy import, Details
All previous blocking issues remain resolved. Clean implementation with thorough test coverage. |
| if "guardContent" in block: | ||
| guard = block["guardContent"] | ||
| if "text" in guard: | ||
| total += len(encoding.encode(guard["text"]["text"])) |
There was a problem hiding this comment.
If "text" key is missing from guard["text"] this would throw. Do we want to add an extra check here to be more defensive like the other handlers?
| global _cached_encoding | ||
| if _cached_encoding is None: | ||
| try: | ||
| import tiktoken |
There was a problem hiding this comment.
I get caching, but why do we keep importing inside the method? Is this intentionally lazy loading?
There was a problem hiding this comment.
Maybe token estimation should be it's own file?
There was a problem hiding this comment.
This is intentional, since tiktoken is an optional dependency
src/strands/models/model.py
Outdated
|
|
||
| Used for proactive context management (e.g., triggering compression at a | ||
| threshold). This is a naive approximation using tiktoken's cl100k_base encoding. | ||
| Accuracy varies by model provider but is typically within 5-10% for most providers. |
There was a problem hiding this comment.
Is this AI garbage or actual claim?
There was a problem hiding this comment.
This research shows a comparison, where they found a Mean Absolute Percentage Error range of 6.5-11.7% when using tiktoken's cl100k_base
|
|
||
| for message in messages: | ||
| for block in message["content"]: | ||
| total += _count_content_block_tokens(block, encoding) |
There was a problem hiding this comment.
nit, one trick we can do to improve accuracy: instead of getting all of the token count for messages array, just keep track of the latest consumed tokens, and just estimate latest one.
So then your error margin for the history is 0% (bc we literally know the token count), and the only error happens in the latest added message
There was a problem hiding this comment.
I like this suggestion. There's separate work going on to expose the latest token count which makes this possible. Once that's set up we can implement this as a follow-up optimization
Description
_estimate_tokens()method to theModelbase class for estimating input token count before sending to the model, enabling proactive context management (e.g., triggering compression at a threshold)cl100k_baseencoding) as a universal fallback for all 11 providers — individual providers can override with native counting APIs latertiktokenas a optional dependency_estimate_tokensis internal facing only for now and will only be installed for users utilizing proactive context management features: ( [FEATURE] Proactive Context Compression #555, [FEATURE] Large Tool Result Externalization via AfterToolCallEvent Hook #1296, [FEATURE] In-envent-loop cycle context management #298)Related Issues
#1294
Documentation PR
This should be internally facing, documentation not required
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.