Skip to content

Excessive token/quota consumption — weekly limit drains in 1-2 days with Opus #118

@Ufooo

Description

@Ufooo

Platform

macOS

Operating system version

macOS Tahoe 26.4

System architecture

ARM64 (M1, M2, etc)

PolyScope Version

0.15.0

Bug description

Using Polyscope with Opus on a MAX plan, the weekly rate limit is consumed extremely fast — typically within 1-2 days of normal use. A single complex task (plan + implementation) can burn 5-10% of the weekly quota.

Even simple sessions where the agent reads a few files and makes edits cost $3-5 per interaction. Longer sessions with many tool calls can exceed $20 for a single prompt.

What I think is happening

When I send a new message — especially after being away for a while — the agent seems to resume the full previous session context. Every tool call in that session then carries the entire accumulated context. So if the agent makes 10 tool calls, the full conversation history is sent 10 times. With a 1M context window on Opus, this can mean millions of tokens per turn even for simple tasks.

I suspect the cache_read tokens (from prompt caching) are the biggest cost driver. Even though they're cheaper per token than fresh input, they still seem to count toward the rate limit quota. So a resumed session with a large context burns through the weekly limit at the same rate as if all those tokens were fresh.

After idle periods (overnight, lunch break), the first prompt often takes extremely long or hangs completely — which seems consistent with trying to resume a very large stale session.

Using Claude Code CLI directly for comparable tasks seems significantly cheaper — possibly because CLI sessions are shorter-lived and don't accumulate as much context.

Questions

  • Is Polyscope doing anything to manage context size between sessions? Or does it always resume the full previous conversation?
  • Is there a way to see per-prompt token consumption breakdown (input, output, cache read, cache creation) so we can understand where the cost goes?
  • Are there any recommended settings or workflows to reduce consumption while keeping Opus for complex tasks?
  • Is this something the team is aware of and working on?

Steps to reproduce

  • Open Polyscope with Opus as the default model (MAX plan)
  • Work on a task that involves multiple tool calls (file reads, edits, bash commands)
  • Close the laptop or leave Polyscope idle for a few hours
  • Come back and send a new prompt
  • Check claude.ai/settings/usage — observe that the weekly limit jumped significantly from a single interaction
  • Repeat for a day or two — weekly limit is nearly exhausted

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions