Skip to content

Conversation

@SungJin1212
Copy link
Member

@SungJin1212 SungJin1212 commented Jan 14, 2026

This PR introduces local cache warmup logic on HA tracker startup that fetches all keys from the KV store and warms the local cache.

Previously, whenever the Distributor started, it suffered from initial cold cache misses. Since the local map was empty, the HATracker treated every incoming request as a new entry. This caused unnecessary CAS operations to the KV store even for existing valid keys.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@SungJin1212 SungJin1212 force-pushed the sync-ha-tracker-on-start branch 4 times, most recently from 3567f6f to 7786687 Compare January 15, 2026 01:51
@friedrichg
Copy link
Member

Do you see distributors in clusters with large number of pairs using more memory after this change? Do we need a flag for this?

@SungJin1212
Copy link
Member Author

@friedrichg
I added a benchmark for syncKVStoreToLocalMap under various key size.
I ran a benchmark and the results indicate that the overhead is negligible even at a very large scale (10000 keys, about 18MB).

goos: darwin
goarch: arm64
pkg: github.com/cortexproject/cortex/pkg/ha
cpu: Apple M4 Max
BenchmarkHATracker_syncKVStoreToLocalMap
BenchmarkHATracker_syncKVStoreToLocalMap/keys=100
BenchmarkHATracker_syncKVStoreToLocalMap/keys=100-14         	   10740	    113366 ns/op	  203197 B/op	    2532 allocs/op
BenchmarkHATracker_syncKVStoreToLocalMap/keys=1000
BenchmarkHATracker_syncKVStoreToLocalMap/keys=1000-14        	    1174	    973137 ns/op	 1834239 B/op	   24289 allocs/op
BenchmarkHATracker_syncKVStoreToLocalMap/keys=10000
BenchmarkHATracker_syncKVStoreToLocalMap/keys=10000-14       	     130	   9065987 ns/op	18199241 B/op	  240955 allocs/op
PASS

Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SungJin1212 Thanks, this is great work.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 15, 2026
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit doubt of how much this cache warmup would help.
I think at least we never see this issue in production. If we think it is good to have I prefer to put this behind a feature flag and disable by default.

I am a bit worried that during a big fleet scale up distributors ended up sending too many requests to KV stores just to warmup cache and causing much bigger impact.

Signed-off-by: SungJin1212 <[email protected]>
@SungJin1212 SungJin1212 force-pushed the sync-ha-tracker-on-start branch from 2644e14 to 0850259 Compare January 16, 2026 08:23
@SungJin1212
Copy link
Member Author

@yeya24
I've put the warmup logic behind the flag so that users can opt-in only if they need.

Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flag might also prove useful for #7220.

At least bring us closer

@friedrichg
Copy link
Member

@SungJin1212 maybe also mark it as experimental and include the flag in https://github.com/cortexproject/cortex/blob/master/docs/configuration/v1-guarantees.md

Signed-off-by: SungJin1212 <[email protected]>
@SungJin1212
Copy link
Member Author

@friedrichg
I added it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ha-tracker lgtm This PR has been approved by a maintainer size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants