-
Notifications
You must be signed in to change notification settings - Fork 595
HDDS-14245. Optimize search for deleted range using prefix count index structure #9690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…nges and simple logic Change-Id: I7a06f5a427ba7d1c4f760ff70fcb4093abaa3cbd
…earch using prefix count index structure Change-Id: I704dab8380d1dd35faa2b9453ef90b439937e95b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the batch operation performance in RocksDB by introducing a RangeQueryIndex data structure to efficiently track delete range operations. Previously, checking if an operation was affected by a delete range required O(n) iteration through all delete ranges for each operation, resulting in O(n²) complexity when many delete ranges were queued. The new index structure uses a sweep-line algorithm with prefix sums to achieve O(log B) query time, where B is the number of distinct range boundaries.
Changes:
- Introduced
RangeQueryIndexclass using a TreeMap-based sweep-line algorithm for efficient range intersection queries - Modified
RDBBatchOperationto use the new index when preparing batch writes to filter out operations that fall within delete ranges - Added
deleteRangeWithBatchAPI methods across the Table hierarchy (Table, TypedTable, RDBTable, DatanodeTable, InMemoryTestTable) - Added comprehensive unit tests for RangeQueryIndex and integration tests for batch operations with delete ranges
- Refactored
OmSnapshotManager.deleteKeysFromTableWithBucketPrefixto use the newdeleteRangeWithBatchinstead of iterating and deleting keys individually - Updated gRPC version from 1.75.0 to 1.77.0
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Updated ratis-thirdparty.grpc.version from 1.75.0 to 1.77.0 |
| RangeQueryIndex.java | New class implementing sweep-line algorithm for efficient range intersection queries |
| TestRangeQueryIndex.java | Comprehensive unit tests for RangeQueryIndex covering various edge cases |
| RDBBatchOperation.java | Added DeleteRangeOperation class and integrated RangeQueryIndex for optimized batch preparation |
| Table.java | Added deleteRangeWithBatch API method |
| TypedTable.java | Implemented deleteRangeWithBatch by delegating to rawTable |
| RDBTable.java | Implemented deleteRangeWithBatch using RDBBatchOperation |
| RocksDatabase.java | Added batchDeleteRange method to ColumnFamily |
| DatanodeTable.java | Implemented deleteRangeWithBatch by delegating to wrapped table |
| InMemoryTestTable.java | Added stub implementation of deleteRangeWithBatch |
| TestRDBBatchOperation.java | Updated tests to verify delete range filtering logic and added randomized integration test |
| TestRDBTableStore.java | Added tests for batch delete range operations and operation ordering |
| OmSnapshotManager.java | Refactored to use deleteRangeWithBatch instead of manual iteration |
Comments suppressed due to low confidence (1)
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java:420
- The FamilyCache summary method doesn't include delRangeCount in its output string, even though delRangeCount is being tracked. This makes the logging incomplete and inconsistent with the overall getCommitString method which does include delRangeCount. Consider updating the summary to include "#delRange=%s, delRangeCount" for consistency and better visibility into batch operations.
private String summary() {
return String.format(" %s %s, #put=%s, #del=%s", this,
batchSizeDiscardedString(), putCount, delCount);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * | ||
| * The test method includes: | ||
| * 1. Setup of mocked ColumnFamilyHandle and RocksDatabase.ColumnFamily. | ||
| * 2. Mocking of methods to track operations performed on*/ |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc comment for the TestRDBBatchOperation class is incomplete. The comment abruptly ends mid-sentence at "2. Mocking of methods to track operations performed on". This should be completed to provide a full description of what the test class does.
| * 2. Mocking of methods to track operations performed on*/ | |
| * 2. Mocking of methods to track operations performed on the | |
| * ManagedWriteBatch and verification of the tracked operations after | |
| * the batch is committed. | |
| */ |
| * <p><strong>Complexities:</strong> | ||
| * <ul> | ||
| * <li>Build: {@code O(R log B)} where {@code R} is #ranges and {@code B} is #distinct boundaries.</li> | ||
| * <li>{@link #containsIntersectingRange(Comparable)} (Object)}: {@code O(log B)}.</li> |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity documentation contains a syntax error. The line states "{@link #containsIntersectingRange(Comparable)} (Object)}" but the "(Object)" appears to be a typo and should be removed. The correct format should be just "{@link #containsIntersectingRange(Comparable)}".
| * <li>{@link #containsIntersectingRange(Comparable)} (Object)}: {@code O(log B)}.</li> | |
| * <li>{@link #containsIntersectingRange(Comparable)}: {@code O(log B)}.</li> |
| Range<T> getFirstIntersectingRange(T key) { | ||
| Map.Entry<T, Integer> countEntry = rangeCountIndexMap.floorEntry(key); | ||
| if (countEntry == null) { | ||
| return null; | ||
| } | ||
| for (Range<T> range : ranges) { | ||
| if (range.contains(key)) { | ||
| return range; | ||
| } | ||
| } | ||
| return null; |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for getFirstIntersectingRange states that "This method first checks containsIntersectingRange using the index; if the count indicates an intersection exists, it then scans the backing ranges set", but the implementation doesn't actually verify that countEntry.getValue() > 0 before scanning the ranges set. This could lead to unnecessary O(n) scanning of the ranges set even when the index indicates no intersecting ranges exist. Consider adding a check like "if (countEntry.getValue() <= 0) { return null; }" after line 133 to match the documented behavior and improve performance.
|
|
||
| void deleteRange(byte[] startKey, byte[] endKey) { | ||
| delRangeCount++; | ||
| ops.put(opIndex.getAndIncrement(), new DeleteRangeOperation(startKey, endKey)); |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deleteRange method doesn't update the batchSize field when adding a DeleteRangeOperation, unlike the put and delete methods which call overwriteIfExists and update batchSize accordingly. This inconsistency means that batch size tracking will be incomplete, potentially affecting batch operation monitoring and logging. Consider adding "batchSize += new DeleteRangeOperation(startKey, endKey).totalLength();" or restructuring to ensure batch size is properly tracked for delete range operations.
| ops.put(opIndex.getAndIncrement(), new DeleteRangeOperation(startKey, endKey)); | |
| DeleteRangeOperation op = new DeleteRangeOperation(startKey, endKey); | |
| batchSize += op.totalLength(); | |
| ops.put(opIndex.getAndIncrement(), op); |
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(startInclusive, endExclusive); |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a violation of the equals-hashCode contract in the Range class. The equals method uses identity comparison (this == o), while hashCode computes the hash based on the values of startInclusive and endExclusive. This means two Range instances with identical bounds will have the same hashCode but will not be equal according to equals, which violates the Java contract that states: "If two objects are equal according to the equals method, then calling hashCode on each of them must produce the same integer result." While this appears to be intentional based on the test cases (e.g., testDuplicateSameBoundsDifferentInstances), it could lead to unexpected behavior when using Range objects in hash-based collections and makes the class fragile. Consider either making equals value-based to match hashCode, or making hashCode identity-based (e.g., using System.identityHashCode) to match equals.
| return Objects.hash(startInclusive, endExclusive); | |
| return System.identityHashCode(this); |
| <ranger.version>2.7.0</ranger.version> | ||
| <!-- versions included in ratis-thirdparty, update in sync --> | ||
| <ratis-thirdparty.grpc.version>1.75.0</ratis-thirdparty.grpc.version> | ||
| <ratis-thirdparty.grpc.version>1.77.0</ratis-thirdparty.grpc.version> |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gRPC version is being updated from 1.75.0 to 1.77.0, but this change is not mentioned in the PR description which focuses on optimizing search for deleted ranges. If this version bump is intentional and related to the changes in this PR, please update the PR description to explain why this dependency update is needed. If it's unrelated, consider moving it to a separate PR for clarity and to keep changes focused on a single purpose.
| /** | ||
| * Delete range operation to be applied to a {@link ColumnFamily} batch. | ||
| */ | ||
| private final class DeleteRangeOperation extends Op { |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteRangeOperation should be made static, since the enclosing instance is not used.
| private final class DeleteRangeOperation extends Op { | |
| private static final class DeleteRangeOperation extends Op { |
What changes were proposed in this pull request?
Currently in prepare batch to verify if an operation is still valid an O(n)check is performed on each and every deleteRange op that has occured after the operation. This is done for each operation in the batch and if all deleteRanges are going to be stacked at the end then the prepare batch can become an O(n2) operation making it unoptimal. This patch aims to create Index data structure to track all the deleteRange operations and provide an efficient way to check if a delete range operation exists or not. The delete ranges would also be popped out of the index structure as and when the delete range op gets executed.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14245
How was this patch tested?
Added unit tests