-
Notifications
You must be signed in to change notification settings - Fork 0
Excessive token/quota consumption — weekly limit drains in 1-2 days with Opus #118
Description
Platform
macOS
Operating system version
macOS Tahoe 26.4
System architecture
ARM64 (M1, M2, etc)
PolyScope Version
0.15.0
Bug description
Using Polyscope with Opus on a MAX plan, the weekly rate limit is consumed extremely fast — typically within 1-2 days of normal use. A single complex task (plan + implementation) can burn 5-10% of the weekly quota.
Even simple sessions where the agent reads a few files and makes edits cost $3-5 per interaction. Longer sessions with many tool calls can exceed $20 for a single prompt.
What I think is happening
When I send a new message — especially after being away for a while — the agent seems to resume the full previous session context. Every tool call in that session then carries the entire accumulated context. So if the agent makes 10 tool calls, the full conversation history is sent 10 times. With a 1M context window on Opus, this can mean millions of tokens per turn even for simple tasks.
I suspect the cache_read tokens (from prompt caching) are the biggest cost driver. Even though they're cheaper per token than fresh input, they still seem to count toward the rate limit quota. So a resumed session with a large context burns through the weekly limit at the same rate as if all those tokens were fresh.
After idle periods (overnight, lunch break), the first prompt often takes extremely long or hangs completely — which seems consistent with trying to resume a very large stale session.
Using Claude Code CLI directly for comparable tasks seems significantly cheaper — possibly because CLI sessions are shorter-lived and don't accumulate as much context.
Questions
- Is Polyscope doing anything to manage context size between sessions? Or does it always resume the full previous conversation?
- Is there a way to see per-prompt token consumption breakdown (input, output, cache read, cache creation) so we can understand where the cost goes?
- Are there any recommended settings or workflows to reduce consumption while keeping Opus for complex tasks?
- Is this something the team is aware of and working on?
Steps to reproduce
- Open Polyscope with Opus as the default model (MAX plan)
- Work on a task that involves multiple tool calls (file reads, edits, bash commands)
- Close the laptop or leave Polyscope idle for a few hours
- Come back and send a new prompt
- Check claude.ai/settings/usage — observe that the weekly limit jumped significantly from a single interaction
- Repeat for a day or two — weekly limit is nearly exhausted