fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962
fix(pd): resolve hostname entries in IpAuthHandler allowlist#2962bitflicker64 wants to merge 7 commits intoapache:masterfrom
Conversation
…at startup to avoid Netty DNS blocking
|
How I tested:
Results with pd2 as leader and hostnames only: Before this fix: After this fix: Hostnames are resolved to IPs at startup via |
There was a problem hiding this comment.
Pull request overview
Fixes PD Raft inbound connection allowlisting when the configured allowlist contains hostnames (e.g., Docker bridge service names) by resolving those hostnames to IPs up front and validating connections via a simple set lookup.
Changes:
- Resolve allowlist entries via
InetAddress.getAllByName()at handler construction time - Cache resolved IPs in an immutable
Setused byisIpAllowed() - Log a warning when an allowlist hostname can’t be resolved
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/auth/IpAuthHandler.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/auth/IpAuthHandler.java
Show resolved
Hide resolved
…r changes - Resolve allowlist hostnames to IPs using InetAddress.getAllByName - Add refresh() to update resolved IPs when Raft peer list changes - Wire refresh into RaftEngine.changePeerList() - Add IpAuthHandlerTest covering hostname resolution, refresh behavior, and failure cases
|
While working on this change, I noticed a couple of related behaviors that appear to predate this PR. The /v1/members/change endpoint relies on RestAuthentication, where Authentication.authenticate() currently validates only the username portion of Basic Auth. I also noticed that GrpcAuthentication does not appear to be wired into the gRPC server. These are outside the scope of this fix, but I thought it might be helpful to mention in case they are worth tracking separately. |
- Cache leader PeerId after waitingForLeader() and null-check to avoid NPE when leader election times out - Remove incorrect fallback that derived leader gRPC address from local node's port, causing silent misroutes in multi-node clusters - Wire config.getRpcTimeout() into RaftRpcClient's RpcOptions so Bolt transport timeout is consistent with future.get() caller timeout - Replace hardcoded 10000ms in waitingForLeader() with config.getRpcTimeout() - Remove unused RaftOptions variable and dead imports (ReplicatorGroup, ThreadId) Fixes apache#2959 Related to apache#2952, apache#2962
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java
Outdated
Show resolved
Hide resolved
Some context about the auth system in PD/Store: To put it simply, PD/Store didn't have a strict auth mechanism due to legacy design decisions. Since these and other gRPC components were meant to be internal-only, only the graph server was originally built with a formal public-facing auth interface. We later added a basic, quick-and-dirty auth layer for security, but the current implementation remains incomplete. We plan to systematically refactor this later, but for now, let's just add TODOs in the relevant places |
Thanks for the context, that makes sense. I’ll add the TODOs in the relevant places and work through the reviewed suggestions. Apologies for the delay as my Mac broke so I had to shift my setup to Ubuntu |
- Wire IpAuthHandler.refresh() into PDService.updatePdRaft() which calls node.changePeers() directly, bypassing RaftEngine.changePeerList() - Include learner IPs in refresh since updatePdRaft() supports learner peers - Add @before setUp() to IpAuthHandlerTest to prevent stale singleton from earlier suite classes poisoning test assertions - Strengthen testUnresolvableHostnameDoesNotCrash assertions - Replace testGetInstanceReturnsNullBeforeInit with testGetInstanceReturnsSingletonIgnoresNewAllowlist
- Document missing auth check on MemberAPI changePeerList endpoint - Note that password validation is currently skipped in Authentication - Flag that GrpcAuthentication interceptor is not wired into the gRPC server
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Outdated
Show resolved
Hide resolved
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/auth/IpAuthHandler.java
Show resolved
Hide resolved
hugegraph-pd/hg-pd-service/src/main/java/org/apache/hugegraph/pd/service/PDService.java
Show resolved
Hide resolved
…fresh() wiring - Add RaftEngineIpAuthIntegrationTest to prove the full chain: changePeerList() -> changePeers callback -> handler.refresh() - Add failure-path test to ensure allowlist is not refreshed on failed peer update - Uses mock Node that fires callback synchronously to avoid latch.await() hang - Register in PDCoreSuiteTest suite
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java
Outdated
Show resolved
Hide resolved
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/IpAuthHandlerTest.java
Outdated
Show resolved
Hide resolved
- Add timed latch.await() using 3x rpcTimeout to prevent indefinite blocking if changePeers callback is never invoked - Use compareAndSet in both callback and timeout path to avoid race - Restore Thread.currentThread().interrupt() on InterruptedException - Fix testHostnameResolvesToIp() to resolve localhost dynamically instead of hardcoding 127.0.0.1 — avoids flakiness on IPv6 systems
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Purpose of the PR
IpAuthHandleronly compared the client IP with the allowlist entries directly.When the allowlist contains hostnames, connections from their resolved IPs could be rejected.
Main Changes
InetAddress.getAllByNameSet.contains()lookupVerifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need