HDDS-13590. Refactor HealthyPipelineSafeModeRule to not use PipelineReportFromDatanode#9651
Conversation
sumitagrawl
left a comment
There was a problem hiding this comment.
@priyeshkaratha Have few comments, logs will increase too much with change and rule once satisfied may not be satisfied next time, which is change in behavior. Check the impact.
...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java
Show resolved
Hide resolved
...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java
Outdated
Show resolved
Hide resolved
| "finalization. Bypassing healthy pipeline safemode rule."); | ||
| return true; | ||
| } | ||
| // Query PipelineManager directly for healthy pipeline count |
There was a problem hiding this comment.
In current implementation, if Rule is satisfied once, its marked exit. Never enter again. But with having dynamic rule check, one time may be success and other time may be failure.
This may not satisfy the purpose.
There was a problem hiding this comment.
- move logic from validate to process
- log only one reported as event in process()
aryangupta1998
left a comment
There was a problem hiding this comment.
Thanks for the patch @priyeshkaratha. Figured out a bug that can make the SCM exit HealthyPipelineSafeModeRule early, please take a look!
...erver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/HealthyPipelineSafeModeRule.java
Show resolved
Hide resolved
|
@sumitagrawl @aryangupta1998 Thanks for the review and I have addressed your comments. |
81bf2f6 to
af67ff3
Compare
aryangupta1998
left a comment
There was a problem hiding this comment.
Thanks for addressing the comments @priyeshkaratha, LGTM!
|
@sumitagrawl would you like to take another look? |
|
Thanks @priyeshkaratha for the patch, @aryangupta1998, @sumitagrawl for the review. |
What changes were proposed in this pull request?
Refactored HealthyPipelineSafeModeRule to make it simpler and more reliable.
Earlier, it depended on events and internal state to track pipelines. Now, it directly checks the current state from the PipelineManager whenever validation is needed.
HealthyPipelineSafeModeRule.java
Pipeline has exactly 3 nodes
All nodes are in healthy state (NodeStatus.inServiceHealthy())
All nodes are registered with the node manager
SafeModeMetrics.java
Replaced incCurrentHealthyPipelinesCount() with setNumCurrentHealthyPipelines(long)
Test Updates
What is the link to the Apache JIRA
HDDS-13590
How was this patch tested?
Tested using existing testcases written for safemode validation rules and Safemode Manager.
Validated using forked CI.