Fix export task not being killed during s3 outage by arthurpassos · Pull Request #1564 · Altinity/ClickHouse

arthurpassos · 2026-03-20T15:26:08Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

The drop table operation must signal cancellation to all background tasks and wait until they ack it. This is done checking the is_cancelled flag at each pipeline iteration. If S3 is unreachable and s3_retries_attempt is big (by default, it is 500), the pipeline gets stuck deep in the AWS SDK and never gets a chance to check the signal / flag. Making the task "unkillable".

This PR fixes it in a hackish way by overwriting the query_is_cancelled_predicate, which is checked by the S3 client retry strategy upon ShouldRetry.

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

arthurpassos · 2026-03-20T15:26:25Z

@codex review

github-actions · 2026-03-20T15:27:24Z

Workflow [PR], commit [f15590e]

arthurpassos · 2026-03-20T15:29:24Z

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1564 (export-task cancellation during S3 outage):
Reviewed diff: PR page, raw diff

No confirmed defects in reviewed scope.

Coverage summary:

Scope reviewed: src/Common/ThreadStatus.h and src/Storages/MergeTree/ExportPartTask.cpp, including call path into CurrentThread::get().isQueryCanceled() used by S3 retry logic.
Categories failed: none.
Categories passed: call-graph consistency, transition invariants, branch outcomes (success/cancel/exception), cancellation propagation to S3 retry, multithreaded interleavings (task thread vs cancel caller), rollback/partial-update safety in exception path, C++ bug classes (lifetime/race/deadlock/exception-safety/UB) within changed code.
Assumptions/limits: static audit only (no runtime fault injection execution), constrained to the provided PR diff and directly affected call paths.

chatgpt-codex-connector · 2026-03-20T15:38:37Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

arthurpassos · 2026-03-20T15:43:06Z

I'll see if I can add tests (I actually already have those, but for some reason they were not failing :))

arthurpassos · 2026-03-20T15:43:44Z

I'll see if I can add tests (I actually already have those, but for some reason they were not failing :))

I think I know why. Probably because blocking S3 communication with IP tables was throwing an exception that is non retryable, leading to the export failing fast and no issues at all.

Enmk · 2026-03-20T16:22:58Z

src/Storages/MergeTree/ExportPartTask.cpp

+    (*exports_list_entry)->thread_group->setCancelPredicate(
+        [this]() -> bool { return isCancelled(); });
+


Are you sure that lifetime of the task exceeds lifetime of the thread group at all times? Maybe capture a weak pointer instead of this?

So.. when writing this code, this came to mind. The thing is that ThreadGroup is a member of ExportListEntry, which is tied to the lifetime of this task. So I assumed it would be valid. At the same time, ThreadGroupPtr is a shared_ptr, and it gets passed down to the pipeline and ThreadGroupSwitcher, so I am not actually sure about it... Too much wizardry

Maybe a weak_pointer would indeed be safer

fix export task not being killed during s3 outage

513bfb8

arthurpassos added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Mar 20, 2026

Enmk reviewed Mar 20, 2026

View reviewed changes

use weakptr to be on the safe side

f15590e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix export task not being killed during s3 outage#1564

Fix export task not being killed during s3 outage#1564
arthurpassos wants to merge 2 commits intoantalya-26.1from
fix_s3_outage_preventing_export_from_being_cancelled

arthurpassos commented Mar 20, 2026

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 20, 2026

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

Enmk Mar 20, 2026

Uh oh!

arthurpassos Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		(*exports_list_entry)->thread_group->setCancelPredicate(
		[this]() -> bool { return isCancelled(); });

Conversation

arthurpassos commented Mar 20, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot commented Mar 20, 2026

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

arthurpassos commented Mar 20, 2026

Uh oh!

Enmk Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

arthurpassos Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 20, 2026 •

edited

Loading