Fix use after free in rescheduleTasksFromReplica#1561
Fix use after free in rescheduleTasksFromReplica#1561ianton-ru wants to merge 4 commits intoantalya-25.8from
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c71caec37e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto file_identifier = file->getAbsolutePath().value_or(file->getPath()); | ||
| auto file_replica_idx = getReplicaForFile(file_identifier); | ||
| unprocessed_files.emplace(file_identifier, std::make_pair(file, file_replica_idx)); |
There was a problem hiding this comment.
Preserve identifier semantics when re-queueing lost-replica task
When a replica disappears mid-request, this branch re-queues the task using getAbsolutePath().value_or(getPath()) as the map key, but getPreQueuedFile later looks up queued tasks by send_over_whole_archive ? getPathOrPathToArchiveIfArchive() : getIdentifier(). For bucketed objects (getIdentifier() includes bucket suffix) or archive mode, the key no longer matches, so pre-queued tasks are skipped, and emplace can collapse multiple buckets with the same path into one entry.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50453ff019
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto file_identifier = file->getAbsolutePath().value_or(file->getPath()); | ||
| auto file_replica_idx = getReplicaForFile(file_identifier); | ||
| unprocessed_files.emplace(file_identifier, std::make_pair(file, file_replica_idx)); |
There was a problem hiding this comment.
Requeue with canonical file identifier
When a replica is lost mid-getNextTask, this branch re-inserts the task using file->getAbsolutePath().value_or(file->getPath()), but the rest of the distributor now keys tasks via getFileIdentifier(...)/getIdentifier() (including bucket suffixes and archive normalization). In bucket-splitting mode, multiple buckets from the same path collapse to one unprocessed_files key here, and later getPreQueuedFile lookups by canonical identifier won't match this entry, so some tasks can be skipped or dropped after failover.
Useful? React with 👍 / 👎.
50453ff to
bbe007e
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbe007e689
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (processed_file_list_ptr == replica_to_files_to_be_processed.end()) | ||
| { // It is possible that replica was lost after check in the begining of the method | ||
| auto file_identifier = getFileIdentifier(file); | ||
| auto file_replica_idx = getReplicaForFile(file_identifier); | ||
| unprocessed_files.emplace(file_identifier, std::make_pair(file, file_replica_idx)); |
There was a problem hiding this comment.
Stop returning task after requeueing lost replica work
When getNextTask detects that the current replica was removed, it re-enqueues the selected file into unprocessed_files/connection_to_files but still returns the same file at the end of the function. In the race where rescheduleTasksFromReplica() runs between task selection and this check, the same object becomes available to another replica while this caller can still process it, which can lead to duplicate processing/results if the caller proceeds with the returned task.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Technically it can't be.
getNextTask calls from processReadTaskRequest, what is called form processPacket as well as rescheduleTasksFromReplica. So rescheduleTasksFromReplica can't be called for the same replica simultaneously with getNextTask.
Comment was written incorrectly, code fixed
|
@codex review |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Solved #1554
Fixed incorrect using iterator to removed element.
Backport of #1493
Fix file identifier for distributed tasks
(logic was backported in #1237)
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: