Skip to content

add setting to define filename pattern for part exports#1512

Merged
zvonand merged 8 commits intoantalya-25.8from
port_258_export_filename_pattern
Mar 17, 2026
Merged

add setting to define filename pattern for part exports#1512
zvonand merged 8 commits intoantalya-25.8from
port_258_export_filename_pattern

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add setting to define filename pattern for part exports - helps with sharding

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Workflow [PR], commit [ed45b16]

@arthurpassos arthurpassos changed the title port add setting to define filename pattern for part exports Mar 10, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8163fb0b93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


try
{
const auto filename = buildDestinationFilename(manifest, destination_storage_id, local_context);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use source table macros when building export filename

The new filename expansion passes destination_storage_id into buildDestinationFilename, so the {shard}/{replica} lookup checks the destination database instead of the source table’s database. When exporting from a DatabaseReplicated source to a regular S3 table (the common path), macro_info.shard and macro_info.replica stay unset and expansion falls back to server-config macros, which can raise NO_ELEMENTS_IN_CONFIG or inject wrong values. This means the newly added pattern feature does not work for the intended sharded/replicated source scenario.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, port + ai completion suggestion glitch, fixed

manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding");
manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file");
manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file");
manifest.filename_pattern = json->getValue<String>("filename_pattern");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve backward compatibility for manifest deserialization

fromJsonString now unconditionally reads filename_pattern, but older metadata.json export manifests do not contain this key. On mixed-version or restarted clusters with pre-existing export tasks, json->getValue<String>("filename_pattern") will throw during task loading and prevent those exports from being processed. This field needs an optional read with a default (matching {part_name}_{checksum}) like other backward-compatible manifest fields.

Useful? React with 👍 / 👎.

ianton-ru
ianton-ru previously approved these changes Mar 11, 2026
ianton-ru
ianton-ru previously approved these changes Mar 16, 2026
Copy link

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 2

@zvonand zvonand merged commit 120dd78 into antalya-25.8 Mar 17, 2026
96 of 97 checks passed
@Selfeer
Copy link
Collaborator

Selfeer commented Mar 18, 2026

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1512 (export part filename pattern setting):

Confirmed defects

  • Medium: Backward-incompatible export manifest deserialization
    • Impact: Nodes running this PR can fail to parse pre-existing export metadata created before this field existed, causing export-task polling/introspection/cancel flows to throw and skip processing.
    • Anchor: src/Storages/ExportReplicatedMergeTreePartitionManifest.h / ExportReplicatedMergeTreePartitionManifest::fromJsonString
    • Trigger: A metadata.json without filename_pattern is loaded after upgrade (e.g., pending/stale export entries from older builds).
    • Why defect: Deserialization unconditionally reads a newly introduced field, while multiple runtime paths still parse historical metadata; this violates backward compatibility for persisted task state.
    • Affected subsystem and blast radius: Replicated MergeTree export-partition control plane; impacts task reload/cleanup/status and some user-facing export management operations.
    • Smallest logical reproduction steps:
      1. Create an export entry with metadata that does not contain filename_pattern.
      2. Run server code from PR add setting to define filename pattern for part exports #1512.
      3. Trigger manifest loading path (background poll, export info query, or cancel path).
      4. Observe parse exception from missing key.
    • Fix direction (short): Make filename_pattern optional in fromJsonString and default it to {part_name}_{checksum} when absent.
    • Regression test direction (short): Add a unit/integration compatibility test that loads legacy metadata.json without filename_pattern and verifies task loading succeeds with default pattern.
    • Code evidence:
// src/Storages/ExportReplicatedMergeTreePartitionManifest.h
manifest.filename_pattern = json->getValue<String>("filename_pattern");
// src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp
// Background task parses metadata for every entry.
const auto metadata = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json);
// src/Storages/StorageReplicatedMergeTree.cpp
// Additional user-facing/control paths parse the same metadata.
const auto metadata = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // getPartitionExportsInfo
const auto manifest = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // exportPartitionToTable existing-key path
const auto manifest = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // killExportPartition fallback scan

Coverage summary

  • Scope reviewed: PR range 366fb38a9ed..ed45b160e19 for filename-pattern feature paths in settings, manifest persistence, scheduler/context propagation, export execution, and related tests/docs.
  • Categories failed: Backward-compatibility of persisted metadata schema; error-contract consistency for old-vs-new manifest parsing.
  • Categories passed: Setting plumbing/query-to-manifest propagation, source-table macro expansion path ({shard}/{replica}) in filename building, concurrency/locking changes (none introduced in feature path), exception-safety for export execution path.
  • Assumptions/limits: Static audit only (no runtime execution); conclusions are for merged PR code paths and realistic upgrade scenarios involving pre-existing export metadata.

@arthurpassos
Copy link
Collaborator Author

Not an issue

@Selfeer
Copy link
Collaborator

Selfeer commented Mar 19, 2026

@arthurpassos I see export partition integration tests failing do they need to be fixed? As I don't see a real issue in the logs - here is a small analysis:

PR #1512 Verification — Integration Test Failures

Link to failing tests: https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1512&sha=ed45b160e19b2a5ccd564543105a00cb2c7ba767&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20flaky%20check%29

TL;DR

The 65 integration test failures are not a feature bug — the filename pattern feature works correctly. The failures are caused by a missing cleanup in the new test test_export_partition_from_replicated_database_uses_db_shard_replica_macros, which poisons the shared replica1 instance for all subsequent tests in the flaky check.

What happens

  1. The test creates a DatabaseReplicated database on replica1 with a ReplicatedMergeTree table whose ZooKeeper path contains {shard}.
  2. replica1 has no <shard> macro in its server config (intentionally — the test verifies DatabaseReplicated identity is used instead).
  3. The test never drops the database/table.
  4. On subsequent flaky-check iterations (tests run 5x on the same cluster), the orphaned table tries to ATTACH and fails with:
    Code: 139. DB::Exception: No macro 'shard' in config while processing
    substitutions in '/clickhouse/tables/{uuid}/{shard}'
    
  5. This error cascades to every query hitting replica1, failing all unrelated tests.

Evidence

  • All 65 failures reference the same orphaned database (repdb_389ea0c1_...) with the same NO_ELEMENTS_IN_CONFIG error.
  • Tests that don't use replica1 pass 100%: test_sharded_export_* (use shard1_replica1/shard2_replica1), test_export_partition_feature_is_disabled (uses replica_with_export_disabled).
  • The feature's own dedicated tests all pass: test_sharded_export_partition_with_filename_pattern (5/5), test_sharded_export_partition_default_pattern (5/5), stateless test 03608 passes in all runs.

Fix

Add cleanup at the end of test_export_partition_from_replicated_database_uses_db_shard_replica_macros:

node.query(f"DROP TABLE IF EXISTS {s3_table}")
watcher_node.query(f"DROP TABLE IF EXISTS {s3_table}")
node.query(f"DROP DATABASE IF EXISTS {db_name}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants