Skip to content

26.1 Antalya port - Timezone for partitioning#1453

Merged
zvonand merged 11 commits intoantalya-26.1from
frontport/antalya-26.1/timezone_for_partitioning
Mar 2, 2026
Merged

26.1 Antalya port - Timezone for partitioning#1453
zvonand merged 11 commits intoantalya-26.1from
frontport/antalya-26.1/timezone_for_partitioning

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Feb 26, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Frontport for Antalya 26.1

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru ianton-ru added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Feb 26, 2026
@github-actions
Copy link

github-actions bot commented Feb 26, 2026

Workflow [PR], commit [918ab34]

@zvonand
Copy link
Collaborator

zvonand commented Mar 2, 2026

test_storage_delta/test.py::test_network_activity_with_system_tables is failing in multiple PRs

@zvonand
Copy link
Collaborator

zvonand commented Mar 2, 2026

test_storage_delta/test.py::test_network_activity_with_system_tables fails in many PRs

@zvonand zvonand merged commit 3e0d716 into antalya-26.1 Mar 2, 2026
307 of 313 checks passed
@CarlosFelipeOR
Copy link
Collaborator

QA Verification

PR Summary

This PR adds the iceberg_partition_timezone setting to support correct date/time partitioning in Iceberg tables when the server timezone differs from the data timezone.

Integration Tests (Altinity/ClickHouse CI)

PR test passed (6/6 runs across x86 and arm64):

  • test_database_iceberg/test_partition_timezone.py::test_partition_timezone — PASSED on amd_binary (2/5, 3/5) and arm_binary (1/4)

Backward compatibility: Existing Iceberg integration tests (test_database_iceberg/, test_storage_iceberg/) do not use the new iceberg_partition_timezone setting (default = empty) and all passed, confirming no regression on existing behavior.

Unrelated failures:

Test / Job Status Root Cause
test_storage_delta/test.py::test_network_activity_with_system_tables FAIL Regression from PR #1432 (confirmed by developer). Fix tracked in #1474
Integration tests (arm_binary, distributed plan, 3/4) CANCELLED / ERROR ARM CI infrastructure timeout issue, being fixed by PR #1466. Affects all antalya-26.1 PRs

Regression Tests (Altinity/clickhouse-regression)

No iceberg_partition_timezone tests exist in the regression suites yet. PR #1453 is covered by its own integration test.

All failures are unrelated and pre-existing:

Suite x86 arm64 Reason
iceberg_1, iceberg_2 FAIL FAIL Antalya features still being forward-ported to 26.1
parquet FAIL FAIL Antalya features still being forward-ported to 26.1
s3_minio_export_partition FAIL FAIL Antalya features still being forward-ported to 26.1
s3_minio_export_part FAIL FAIL Antalya features still being forward-ported to 26.1
swarms FAIL FAIL Antalya features still being forward-ported to 26.1
settings FAIL FAIL Not updated for 26.1 yet
version FAIL FAIL Fixed in PR #1377, build was not up to date when regression was run
ice FAIL FAIL New suite, not ready
aggregate_functions_3 FAIL PASS Flaky test (rankCorrState snapshot), being addressed by the QA team

Test Coverage Notes

The integration test covers the primary scenario: DayTransform on TimestampType with a positive timezone offset (Asia/Istanbul, UTC+3) and verifies both data reading and partition pruning with iceberg_partition_timezone=UTC.

Not covered by the current test: other temporal transforms (hour, month, year), TimestamptzType, and negative timezone offsets. These are acceptable gaps for a forward-port of an existing feature and can be addressed with additional tests in the future.

Conclusion

All CI failures are justified and unrelated to PR #1453. The PR's integration test passes on both x86 and arm64. Backward compatibility is confirmed by existing Iceberg tests passing without the new setting.

@CarlosFelipeOR CarlosFelipeOR added the verified Verified by QA label Mar 4, 2026
@CarlosFelipeOR
Copy link
Collaborator

PR #1453 Audit Review

AI review comment (model: gpt-5.3-codex)

1) Scope and partitions

PR introduces Iceberg partition-timezone support and test harness updates across 14 files.
Functional partitioning used:

  • Partition A (runtime Iceberg logic)
    src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.*
    src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFilesPruning.*
    src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFile.cpp
    src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.*
  • Partition B (settings/compatibility)
    src/Core/Settings.cpp
    src/Core/SettingsChangesHistory.cpp
  • Partition C (integration/test infra)
    compose ports + cluster env wiring + new integration test and configs in tests/integration/test_database_iceberg/*

Audit focus:

  • transform parsing and argument-shape consistency,
  • pruning/read/insert path semantic consistency,
  • error-contract consistency,
  • concurrency/locking/regression safety,
  • test coverage relevance.

2) Call graph

Main changed flow:

  • Setting read: Context::getSettingsRef()[Setting::iceberg_partition_timezone]
  • Transform decode: parseTransformAndArgument(transform_name, time_zone)
  • Read/pruning path:
    ManifestFileContent::ManifestFileContent(...) -> getASTFromTransform(...) -> partition key AST / pruning condition
  • Insert partitioning path:
    ChunkPartitioner::ChunkPartitioner(...) -> ChunkPartitioner::partitionChunk(...) (adds timezone const arg)
  • Sort-key path:
    getSortingKeyDescriptionFromMetadata(...) -> string expression build -> KeyDescription::parse(...)

Integration boundaries:

  • Iceberg metadata JSON,
  • function resolution via FunctionFactory,
  • parser/planner via KeyDescription::parse,
  • REST/minio docker wiring in integration setup.

3) Transition matrix

ID Transition Invariant
T1 transform string -> TransformAndArgument temporal transforms carry timezone only when configured
T2 TransformAndArgument -> pruning AST generated function arg order matches function signature
T3 TransformAndArgument -> insert partition function args partition key calculation is timezone-consistent
T4 TransformAndArgument -> sort-key expression string generated SQL expression is syntactically valid
T5 new setting registration/history compatibility machinery remains valid

4) Logical code-path testing summary

Reviewed branches:

  • temporal transforms (year/month/day/hour) now include optional timezone.
  • non-temporal transforms (identity/void/bucket/truncate) keep previous argument semantics.
  • unknown/malformed transforms still return nullopt or throw where expected.
  • pruning AST generation now appends timezone literal argument when present.
  • insert partitioning now appends timezone constant column when present.
  • sort-key generation appends timezone to textual expression.

Handled-failure behavior:

  • unknown transform in pruning path logs warning + no AST for that transform.
  • malformed bucket/truncate remains fail-closed (nullopt/exception path).

Concurrency/timing:

  • no new shared mutable state, locks, or lock-order changes in reviewed runtime code.

5) Fault categories and category-by-category injection results

Fault category Status Outcome Defects
Temporal transform argument-shape consistency Executed Fail 1
SQL expression serialization correctness Executed Fail 1 (same root cause)
Unknown/malformed transform handling Executed Pass 0
Error-contract consistency Executed Pass 0
Concurrency/race/deadlock Not Applicable N/A 0
C++ lifetime/iterator/ownership Executed Pass 0
Partial-update/rollback safety Executed Pass 0
Performance/resource failure checks Executed Pass 0

6) Confirmed defects (High/Medium/Low)

Medium — Unquoted timezone in sort-key expression generation

  • Impact: transformed Iceberg sort-order parsing/planning can fail when iceberg_partition_timezone is non-empty.
  • Location: src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp, function getSortingKeyDescriptionFromMetadata.
  • Trigger condition: non-empty iceberg_partition_timezone + temporal sort transform (e.g. day) in metadata.
  • Affected transition: T4.
  • Why this is a defect: timezone is appended as raw token (identifier-like), not as SQL string literal.
  • Proof sketch: expression built as toRelativeDayNum(column, UTC) instead of toRelativeDayNum(column, 'UTC') before KeyDescription::parse(...).
  • Smallest logical repro:
    1. set iceberg_partition_timezone='UTC';
    2. load Iceberg metadata with temporal transformed sort-order;
    3. sort-key parse/planning path consumes malformed expression tokenization.
  • Likely fix direction: quote/escape timezone in expression string (or generate AST instead of string concatenation).
  • Regression test direction: add test for transformed sort-order + timezone setting ensuring sort-key description parses successfully.
  • Blast radius: Iceberg read/planning path for tables with transformed sort order.

Code snippet demonstrating defect condition:

if (clickhouse_transform_name->time_zone)
    full_argument += ", " + *clickhouse_transform_name->time_zone;
...
return KeyDescription::parse(order_by_str, column_description, local_context, true);

7) Coverage accounting + stop-condition status

  • Call-graph nodes covered: all changed runtime nodes in partitions A/B/C.
  • Transitions covered: T1-T5.
  • Fault categories: all executed or explicitly N/A.
  • Coverage stop-condition: met (all in-scope nodes/transitions/categories reviewed or justified N/A).

8) Assumptions & Limits

  • Static audit only (no local runtime execution in this pass).
  • Behavioral confirmation for all data type combinations needs dynamic runs.
  • CI outcome and sanitizer runtime evidence were not re-executed here.

9) Confidence rating and confidence-raising evidence

  • Overall confidence: Medium-High.
  • To raise confidence to High:
    • execute integration/regression for Iceberg transformed sort-order under timezone override,
    • run sanitizer-backed jobs touching this path.

10) Residual risks and untested paths

  • Residual risk: temporal transform behavior across all type variants (Date/DateTime/DateTime64) without runtime matrix.
  • Untested path: sort-order metadata edge cases combined with timezone override in heterogeneous schemas.

@ianton-ru
Copy link
Author

@CarlosFelipeOR
"Unquoted timezone in sort-key expression generation" - yes, need to fix

@CarlosFelipeOR
Copy link
Collaborator

@CarlosFelipeOR
Copy link
Collaborator

The known issue reported during verification (#1487) has been fixed by PR #1526. Removing label verified-with-issue and marking as verified.

@CarlosFelipeOR CarlosFelipeOR added verified Verified by QA and removed verified-with-issue Verified by QA and issue(s) found. labels Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 port-antalya PRs to be ported to all new Antalya releases verified Verified by QA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants