Skip to content

feat: naive token estimation via tiktoken#2031

Open
lizradway wants to merge 3 commits intostrands-agents:mainfrom
lizradway:token-estimation
Open

feat: naive token estimation via tiktoken#2031
lizradway wants to merge 3 commits intostrands-agents:mainfrom
lizradway:token-estimation

Conversation

@lizradway
Copy link
Copy Markdown
Member

@lizradway lizradway commented Apr 1, 2026

Description

  • Adds _estimate_tokens() method to the Model base class for estimating input token count before sending to the model, enabling proactive context management (e.g., triggering compression at a threshold)
  • Uses tiktoken (cl100k_base encoding) as a universal fallback for all 11 providers — individual providers can override with native counting APIs later
  • Handles all content block types: text, toolUse, toolResult, reasoningContent, guardContent, citationsContent, system_prompt_content; gracefully skips non-serializable content (e.g., image bytes) while still counting serializable parts. Non-serializable content can be covered in follow up model-native count tokens API.
  • Adds tiktoken as a optional dependency

Related Issues

#1294

Documentation PR

This should be internally facing, documentation not required

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Review Summary (Round 2)

Assessment: Approve ✅

All previously identified blocking issues have been addressed in this update.

Changes Verified
Previous Issue Resolution
API review label required ✅ Method renamed to private _estimate_tokens() - no API review needed
Documentation PR required ✅ Private method is correctly internal-facing
Encoding caching suggestion ✅ Implemented _get_encoding() with _cached_encoding
Docstring expansion ✅ Now includes limitations and override guidance
Missing guardContent test ✅ Added test_estimate_tokens_guard_content_block
Additional Improvements
  • Added edge case tests for non-serializable tool specs
  • Added test for toolUse with binary data
  • Comprehensive test coverage for all content block types

Note: Codecov shows 92.85% patch coverage with 4 partial lines - these are likely error handling branches (except clauses) which are acceptable.

Good implementation with clean code structure, appropriate error handling, and thorough test coverage. 👍

@lizradway lizradway marked this pull request as ready for review April 1, 2026 15:34
@lizradway lizradway added the area-context Session or context related label Apr 1, 2026
@lizradway lizradway marked this pull request as draft April 3, 2026 17:02
@github-actions github-actions bot added size/m and removed size/m labels Apr 3, 2026
@github-actions github-actions bot added size/m and removed size/m labels Apr 3, 2026
@github-actions github-actions bot added size/m and removed size/m labels Apr 3, 2026
@lizradway lizradway temporarily deployed to manual-approval April 3, 2026 19:01 — with GitHub Actions Inactive
@lizradway lizradway marked this pull request as ready for review April 3, 2026 19:08
@lizradway lizradway temporarily deployed to manual-approval April 3, 2026 19:08 — with GitHub Actions Inactive
@opieter-aws
Copy link
Copy Markdown
Contributor

/strands review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

Review Summary (Round 3)

Assessment: Approve ✅

Significant improvements since Round 2 — tiktoken is now an optional dependency with lazy import, system_prompt_content support was added, and token counting is more granular and resilient.

Details
  • Optional dependency: Good decision to make tiktoken optional with a clear ImportError message guiding users to pip install strands-agents[token-estimation].
  • system_prompt_content: The priority logic (structured content over plain string) correctly avoids double-counting.
  • Improved counting: toolUse now counts name + input separately, toolResult iterates text items only — both more accurate and resilient to non-serializable content.
  • Test quality: Exact token count assertions make tests deterministic and the test_get_encoding_raises_without_tiktoken monkeypatch test validates the optional import path.
  • Test dependency fragility (non-blocking): token-estimation is not in the all extra — tests rely on tiktoken being a transitive dep of litellm. See inline comment.

All previous blocking issues remain resolved. Clean implementation with thorough test coverage.

if "guardContent" in block:
guard = block["guardContent"]
if "text" in guard:
total += len(encoding.encode(guard["text"]["text"]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "text" key is missing from guard["text"] this would throw. Do we want to add an extra check here to be more defensive like the other handlers?

global _cached_encoding
if _cached_encoding is None:
try:
import tiktoken
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get caching, but why do we keep importing inside the method? Is this intentionally lazy loading?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe token estimation should be it's own file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, since tiktoken is an optional dependency


Used for proactive context management (e.g., triggering compression at a
threshold). This is a naive approximation using tiktoken's cl100k_base encoding.
Accuracy varies by model provider but is typically within 5-10% for most providers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this AI garbage or actual claim?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This research shows a comparison, where they found a Mean Absolute Percentage Error range of 6.5-11.7% when using tiktoken's cl100k_base


for message in messages:
for block in message["content"]:
total += _count_content_block_tokens(block, encoding)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, one trick we can do to improve accuracy: instead of getting all of the token count for messages array, just keep track of the latest consumed tokens, and just estimate latest one.

So then your error margin for the history is 0% (bc we literally know the token count), and the only error happens in the latest added message

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion. There's separate work going on to expose the latest token count which makes this possible. Once that's set up we can implement this as a follow-up optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-context Session or context related size/m

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants