feat: optimize nacos discovery with privileged agent and LRU cache by 9268 · Pull Request #12867 · apache/apisix

9268 · 2026-01-07T04:14:37Z

Description

This PR fixes a performance issue introduced by #12353 where all worker processes were fetching nacos service discovery information, causing unnecessary network requests and JSON parsing overhead per requests.

Changes made:

Restricted nacos discovery to privileged agent only - Only the privileged agent process now fetches service information from nacos servers
Added LRU cache optimization - Worker processes use version-controlled LRU cache to avoid repeated JSON parsing
Improved performance - Reduced shared memory access and eliminated redundant JSON decoding

Performance improvements:

Eliminates duplicate nacos API calls from multiple worker processes
Reduces JSON parsing overhead through LRU caching
Maintains data consistency through version control mechanism

Which issue(s) this PR fixes:

Fixes #12353

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

Copilot

Pull request overview

This PR optimizes Nacos service discovery by moving registry fetch work to the privileged agent process and introducing a version-controlled per-worker LRU cache to reduce repeated JSON decoding.

Changes:

Restrict Nacos registry fetching timers to the privileged agent process only.
Add a per-worker LRU cache for decoded node lists keyed by a shared-dict “version”.
Store and manage per-service #version entries in the nacos shared dict to drive cache invalidation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Baoyuantop · 2026-01-28T06:06:37Z

Hi @9268, please fix failed CI

9268 · 2026-02-08T13:58:07Z

@Baoyuantop hi,ci has been fixed https://github.com/9268/apisix/actions/runs/21792352299

9268 · 2026-02-09T07:07:35Z

a code lint issue has been fixed;

9268 · 2026-02-24T06:13:43Z

@Baoyuantop
It seems that the error in the GitHub Actions is a compilation error, and this PR does not involve that part. Could you please rerun the failing part? In my fork's branch, the GitHub Actions succeeded except for the Changelog check. https://github.com/9268/apisix

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Baoyuantop · 2026-03-13T07:02:42Z

Hi @9268, there is a failed test that needs to be fixed.

moonming

Hi @9268, thank you for optimizing the Nacos discovery with privileged agent and LRU cache!

Running Nacos discovery only in the privileged agent (rather than all workers) is a smart optimization that reduces redundant API calls. With 9 reviews already, this seems well-discussed.

To move forward:

Please confirm all 9 review comments have been addressed
Does the LRU cache properly handle Nacos service updates? What's the cache invalidation strategy?
Any performance benchmarks showing the reduction in Nacos API calls?

This looks like a solid optimization. Let's get it to merge-ready! Thank you.

9268 · 2026-03-30T02:52:40Z

Hi @9268, thank you for optimizing the Nacos discovery with privileged agent and LRU cache!

Running Nacos discovery only in the privileged agent (rather than all workers) is a smart optimization that reduces redundant API calls. With 9 reviews already, this seems well-discussed.

To move forward:

Please confirm all 9 review comments have been addressed

Does the LRU cache properly handle Nacos service updates? What's the cache invalidation strategy?

Any performance benchmarks showing the reduction in Nacos API calls?

This looks like a solid optimization. Let's get it to merge-ready! Thank you.

Overall, a two-level cache architecture is adopted:
Nacos API → (periodic pull) → nacos_dict (shared memory) → nodes_lrucache (worker-local)
The invalidation strategy is version-driven:

On write (fetch_from_host): After each pull, calculate the CRC32 of the node list using ngx.crc32_long(content) as the version number, and write to key#version. If the node content remains unchanged, the version number stays the same.
On read (_M.nodes): First retrieve key#version from nacos_dict, then pass it as the version parameter to nodes_lrucache(key, nodes_version, ...). The LRU cache internally compares the version number; if the version number changes, the local cache is automatically invalidated and reloaded from nacos_dict.
On service decommissioning: At the end of fetch_from_host, compare curr_service_in_use, delete the key and key#version of services that are no longer in use. The next _M.nodes call will not find the version number and will fall back to returning nil.

LRU's own TTL: ttl = 60s, count = 1024, acts as a fallback eviction mechanism. Normally, version number changes trigger invalidation earlier.

9268 · 2026-03-30T03:01:22Z

Hi @9268, thank you for optimizing the Nacos discovery with privileged agent and LRU cache!

Running Nacos discovery only in the privileged agent (rather than all workers) is a smart optimization that reduces redundant API calls. With 9 reviews already, this seems well-discussed.

To move forward:

Please confirm all 9 review comments have been addressed

Does the LRU cache properly handle Nacos service updates? What's the cache invalidation strategy?

Any performance benchmarks showing the reduction in Nacos API calls?

This looks like a solid optimization. Let's get it to merge-ready! Thank you.

no benchmark tested，my old version apsix doesn't have such issue，after nacos alert，i noticed and fixed it

…es, reduce lru ttl to 60s

Baoyuantop · 2026-04-14T06:34:58Z

Hi @9268, we recently refactored the service discovery code, and there are currently code conflicts in the PR. I was planning to help resolve them, but I’ve found the conflicts to be quite complex—it might require reimplementing the code based on the new service discovery implementation. Can you continue working on this?

feat: optimize nacos discovery with privileged agent and LRU cache

70f7bfd

9268 marked this pull request as ready for review January 7, 2026 06:39

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. performance generate flamegraph for the current PR labels Jan 7, 2026

9268 mentioned this pull request Jan 12, 2026

feat: replace events library with shdict #12353

Merged

5 tasks

Baoyuantop reviewed Jan 26, 2026

View reviewed changes

Comment thread apisix/discovery/nacos/init.lua Outdated

Baoyuantop requested a review from Copilot January 26, 2026 08:27

Copilot started reviewing on behalf of Baoyuantop January 26, 2026 08:28 View session

Baoyuantop added the wait for update wait for the author's response in this issue/PR label Jan 26, 2026

Copilot AI reviewed Jan 26, 2026

View reviewed changes

Comment thread apisix/discovery/nacos/init.lua

Comment thread apisix/discovery/nacos/init.lua

Comment thread apisix/discovery/nacos/init.lua Outdated

Comment thread apisix/discovery/nacos/init.lua Outdated

Comment thread apisix/discovery/nacos/init.lua Outdated

9268 added 3 commits January 27, 2026 14:06

fix:remove chinese comment

58874dd

feat(nacos): enhance error handling

7fa9120

feat(nacos): add test for new feature

9dd9725

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 27, 2026

9268 requested a review from Baoyuantop January 27, 2026 11:23

9268 added 4 commits February 4, 2026 14:57

fix: test port err & license

dbf945c

fix: use real nacos server

5c2ef91

fix: enable admin api

ed9705d

fix:using provider etcd

65625f1

github-actions bot added the user responded label Feb 8, 2026

Baoyuantop added awaiting review and removed wait for update wait for the author's response in this issue/PR user responded labels Feb 9, 2026

fix:code lint

b3768a4

Merge branch 'apache:master' into master

4c03f77

Baoyuantop requested a review from Copilot February 27, 2026 07:58

Copilot started reviewing on behalf of Baoyuantop February 27, 2026 07:59 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Comment thread apisix/discovery/nacos/init.lua

Baoyuantop previously approved these changes Mar 6, 2026

View reviewed changes

Baoyuantop requested review from AlinsRan, membphis and shreemaan-abhishek March 6, 2026 00:33

moonming requested changes Mar 16, 2026

View reviewed changes

Baoyuantop reviewed Mar 25, 2026

View reviewed changes

Comment thread apisix/discovery/nacos/init.lua

Merge branch 'apache:master' into master

9b38c63

fix: exclude #version keys from nacos dump_data to avoid pseudo entri…

7a63cc3

…es, reduce lru ttl to 60s

9268 dismissed Baoyuantop’s stale review via 7a63cc3 March 30, 2026 03:06

Baoyuantop previously approved these changes Apr 2, 2026

View reviewed changes

Baoyuantop force-pushed the master branch from 7a63cc3 to 84fdbfc Compare April 14, 2026 06:24

Baoyuantop dismissed their stale review via 7a63cc3 April 14, 2026 06:33

Baoyuantop force-pushed the master branch from 84fdbfc to 7a63cc3 Compare April 14, 2026 06:33

Baoyuantop mentioned this pull request Apr 17, 2026

fix(nacos): restrict registry fetch to privileged agent only #13236

Open

5 tasks

Conversation

9268 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Which issue(s) this PR fixes:

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Baoyuantop commented Jan 28, 2026

Uh oh!

9268 commented Feb 8, 2026

Uh oh!

9268 commented Feb 9, 2026

Uh oh!

9268 commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Baoyuantop commented Mar 13, 2026

Uh oh!

moonming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

9268 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

9268 commented Mar 30, 2026

Uh oh!

Baoyuantop commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

9268 commented Jan 7, 2026 •

edited

Loading

9268 commented Mar 30, 2026 •

edited

Loading