Skip to content

HDDS-13590. Refactor HealthyPipelineSafeModeRule to not use PipelineReportFromDatanode#9651

Merged
adoroszlai merged 5 commits intoapache:masterfrom
priyeshkaratha:HDDS-13590
Mar 3, 2026
Merged

HDDS-13590. Refactor HealthyPipelineSafeModeRule to not use PipelineReportFromDatanode#9651
adoroszlai merged 5 commits intoapache:masterfrom
priyeshkaratha:HDDS-13590

Conversation

@priyeshkaratha
Copy link
Contributor

@priyeshkaratha priyeshkaratha commented Jan 20, 2026

What changes were proposed in this pull request?

Refactored HealthyPipelineSafeModeRule to make it simpler and more reliable.
Earlier, it depended on events and internal state to track pipelines. Now, it directly checks the current state from the PipelineManager whenever validation is needed.

HealthyPipelineSafeModeRule.java

  • Removed event-based tracking: Eliminated processedPipelineIDs and unProcessedPipelineSet HashSets that tracked pipeline events incrementally
  • Direct validation approach: Modified validate() method to directly query PipelineManager for current open pipelines instead of relying on accumulated event state
  • New health check logic: Added isPipelineHealthy() helper method that validates:
    Pipeline has exactly 3 nodes
    All nodes are in healthy state (NodeStatus.inServiceHealthy())
    All nodes are registered with the node manager
  • Simplified event processing: The process() method now only logs events instead of maintaining state, as validation happens directly during validate() calls
  • Cleaner lifecycle management: Updated cleanup() and initializeRule() methods to remove tracking state management

SafeModeMetrics.java

  • Changed currentHealthyPipelinesCount from counter (MutableCounterLong) to gauge (MutableGaugeLong)
    Replaced incCurrentHealthyPipelinesCount() with setNumCurrentHealthyPipelines(long)
  • This aligns with the new approach where the value is set directly based on current state rather than incremented on events

Test Updates

  • TestHealthyPipelineSafeModeRule.java: Removed assertions that relied on event processing sequence; updated expected log messages
  • TestSCMSafeModeManager.java: Significant updates to handle immediate availability of healthy pipeline counts and added logic to handle edge cases where pipeline thresholds exceed available pipelines

What is the link to the Apache JIRA

HDDS-13590

How was this patch tested?

Tested using existing testcases written for safemode validation rules and Safemode Manager.
Validated using forked CI.

@priyeshkaratha priyeshkaratha marked this pull request as ready for review January 20, 2026 10:17
@ChenSammi ChenSammi self-requested a review January 21, 2026 03:35
Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha Have few comments, logs will increase too much with change and rule once satisfied may not be satisfied next time, which is change in behavior. Check the impact.

"finalization. Bypassing healthy pipeline safemode rule.");
return true;
}
// Query PipelineManager directly for healthy pipeline count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation, if Rule is satisfied once, its marked exit. Never enter again. But with having dynamic rule check, one time may be success and other time may be failure.
This may not satisfy the purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. move logic from validate to process
  2. log only one reported as event in process()

Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @priyeshkaratha. Figured out a bug that can make the SCM exit HealthyPipelineSafeModeRule early, please take a look!

@priyeshkaratha
Copy link
Contributor Author

@sumitagrawl @aryangupta1998 Thanks for the review and I have addressed your comments.

Copy link
Contributor

@aryangupta1998 aryangupta1998 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @priyeshkaratha, LGTM!

@adoroszlai
Copy link
Contributor

@sumitagrawl would you like to take another look?

@adoroszlai adoroszlai merged commit f5d2a0f into apache:master Mar 3, 2026
44 checks passed
@adoroszlai
Copy link
Contributor

Thanks @priyeshkaratha for the patch, @aryangupta1998, @sumitagrawl for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants