Skip to content

[Fix](iceberg) Fix static partition INSERT OVERWRITE with VALUES clause column count validation#59951

Merged
morningman merged 2 commits intoapache:masterfrom
suxiaogang223:fix_static_partition_write
Jan 19, 2026
Merged

[Fix](iceberg) Fix static partition INSERT OVERWRITE with VALUES clause column count validation#59951
morningman merged 2 commits intoapache:masterfrom
suxiaogang223:fix_static_partition_write

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Jan 16, 2026

What problem does this PR solve?

Related: #58396

Problem

When using INSERT OVERWRITE with static partition syntax and VALUES clause, the operation incorrectly failed with:

Column count doesn't match value count. Expected: N, but got: M

Example that was failing:

-- Table has 2 columns: id (int), par (string) - partitioned by par
INSERT OVERWRITE TABLE test_partition_branch
PARTITION (par='a')
VALUES (11), (12);

Error: Column count doesn't match value count. Expected: 2, but got: 1

Root Cause

The column count validation in InsertUtils.normalizePlan() did not account for static partition columns. When using PARTITION (col='value') syntax:

  • The partition column value is already fixed in the PARTITION clause
  • The VALUES should only provide non-partition column values
  • This is standard SQL behavior (Hive, Iceberg, etc.)

The validation was comparing VALUES count against all table columns instead of non-partition columns only.

Solution

Modified InsertUtils.java:363-372 to:

  1. Detect when the sink is UnboundIcebergTableSink
  2. Extract static partition columns from staticPartitionKeyValues
  3. Filter out static partition columns from the column list before validation
  4. Only compare VALUES count against non-partition columns
if (unboundLogicalSink instanceof UnboundIcebergTableSink
        && CollectionUtils.isEmpty(unboundLogicalSink.getColNames())) {
    UnboundIcebergTableSink<?> icebergSink = (UnboundIcebergTableSink<?>) unboundLogicalSink;
    Map<String, Expression> staticPartitions = icebergSink.getStaticPartitionKeyValues();
    if (staticPartitions != null && !staticPartitions.isEmpty()) {
        Set<String> staticPartitionColNames = staticPartitions.keySet();
        columns = columns.stream()
                .filter(column -> !staticPartitionColNames.contains(column.getName()))
                .collect(ImmutableList.toImmutableList());
    }
}

Test Plan

Added comprehensive test coverage in test_iceberg_static_partition_overwrite.groovy:

  • ✅ Single VALUE with static partition
  • ✅ Multiple VALUES with static partition
  • ✅ Different data types (DECIMAL, BOOLEAN, FLOAT, DATETIME, BIGINT)
  • ✅ Multiple static partition columns
  • ✅ Error handling for wrong column count
  • ✅ Column name specification in VALUES

All existing tests continue to pass, ensuring backward compatibility.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run external

@suxiaogang223 suxiaogang223 changed the title Fix static partition write [enhance](iceberg) Fix static partition write Jan 16, 2026
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 88.89% (8/9) 🎉
Increment coverage report
Complete coverage report

@suxiaogang223 suxiaogang223 changed the title [enhance](iceberg) Fix static partition write [Fix](iceberg) Fix static partition INSERT OVERWRITE with VALUES clause column count validation Jan 16, 2026
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32243 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f2fa906a4999030ec266448e4782a41c4913cb6, data reload: false

------ Round 1 ----------------------------------
q1	17615	4200	4066	4066
q2	2103	354	233	233
q3	10119	1348	714	714
q4	10217	848	318	318
q5	7528	2014	1936	1936
q6	194	170	135	135
q7	909	757	661	661
q8	9275	1377	1235	1235
q9	4848	4571	4570	4570
q10	6734	1797	1414	1414
q11	513	294	284	284
q12	684	742	594	594
q13	17765	3845	3130	3130
q14	287	294	294	294
q15	563	519	523	519
q16	684	686	643	643
q17	676	762	521	521
q18	6606	6488	6948	6488
q19	1141	1106	705	705
q20	462	426	265	265
q21	3339	2634	2506	2506
q22	1139	1110	1012	1012
Total cold run time: 103401 ms
Total hot run time: 32243 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4399	4278	4266	4266
q2	339	410	326	326
q3	2234	2847	2495	2495
q4	1507	1898	1502	1502
q5	4462	4205	4345	4205
q6	229	176	128	128
q7	1958	1844	1787	1787
q8	2558	2589	2479	2479
q9	7047	7176	7313	7176
q10	2552	2676	2334	2334
q11	566	489	462	462
q12	738	758	614	614
q13	3655	4105	3110	3110
q14	267	278	260	260
q15	519	490	481	481
q16	626	665	594	594
q17	1096	1263	1269	1263
q18	7715	7536	7272	7272
q19	844	818	802	802
q20	1911	1963	1822	1822
q21	4472	4135	4042	4042
q22	1114	1094	1011	1011
Total cold run time: 50808 ms
Total hot run time: 48431 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f2fa906a4999030ec266448e4782a41c4913cb6, data reload: false

query5	4915	662	493	493
query6	350	244	215	215
query7	4229	475	270	270
query8	354	286	246	246
query9	8742	2946	2924	2924
query10	556	384	340	340
query11	15235	15201	14817	14817
query12	190	121	113	113
query13	1250	492	378	378
query14	6679	3045	2795	2795
query14_1	2647	2633	2633	2633
query15	199	191	171	171
query16	987	422	476	422
query17	1023	644	545	545
query18	2699	422	331	331
query19	219	214	186	186
query20	125	113	118	113
query21	232	142	120	120
query22	4371	4387	4275	4275
query23	16272	15728	15448	15448
query23_1	15520	15658	15480	15480
query24	7058	1582	1190	1190
query24_1	1185	1187	1188	1187
query25	519	427	383	383
query26	1233	268	148	148
query27	2769	453	294	294
query28	4523	2254	2230	2230
query29	778	522	418	418
query30	319	228	208	208
query31	802	624	595	595
query32	120	70	69	69
query33	515	356	315	315
query34	922	893	542	542
query35	718	759	678	678
query36	875	911	867	867
query37	142	100	84	84
query38	2693	2731	2612	2612
query39	785	761	758	758
query39_1	729	716	708	708
query40	220	137	120	120
query41	72	66	62	62
query42	109	115	105	105
query43	501	441	428	428
query44	1368	778	775	775
query45	186	190	174	174
query46	845	969	591	591
query47	1444	1495	1480	1480
query48	322	337	240	240
query49	597	423	351	351
query50	671	286	207	207
query51	3821	3820	3760	3760
query52	109	108	97	97
query53	285	318	271	271
query54	289	263	258	258
query55	86	81	77	77
query56	306	312	311	311
query57	1099	1020	1010	1010
query58	275	267	267	267
query59	2164	2198	2071	2071
query60	354	343	329	329
query61	147	147	154	147
query62	402	366	313	313
query63	310	277	261	261
query64	5108	1387	1121	1121
query65	3891	3726	3644	3644
query66	1396	457	341	341
query67	15692	15614	15570	15570
query68	2406	1204	836	836
query69	479	394	355	355
query70	1105	1057	1018	1018
query71	343	320	300	300
query72	5496	3351	3453	3351
query73	663	753	349	349
query74	8743	8808	8673	8673
query75	2747	2826	2449	2449
query76	2290	1076	685	685
query77	381	398	319	319
query78	9893	9925	9218	9218
query79	2423	925	618	618
query80	1788	559	500	500
query81	552	268	231	231
query82	1002	149	115	115
query83	370	273	254	254
query84	261	127	105	105
query85	914	489	426	426
query86	445	294	322	294
query87	2844	2891	2742	2742
query88	3658	2660	2667	2660
query89	387	354	327	327
query90	1931	185	175	175
query91	175	168	144	144
query92	79	83	73	73
query93	1147	975	574	574
query94	660	322	300	300
query95	576	347	394	347
query96	703	522	251	251
query97	2380	2416	2337	2337
query98	217	204	202	202
query99	604	596	525	525
Total cold run time: 252268 ms
Total hot run time: 175943 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/9) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

ClickBench: Total hot run time: 26.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f2fa906a4999030ec266448e4782a41c4913cb6, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.04	0.04
query3	0.26	0.08	0.09
query4	1.61	0.12	0.12
query5	0.29	0.26	0.27
query6	1.16	0.65	0.66
query7	0.04	0.03	0.02
query8	0.06	0.04	0.04
query9	0.57	0.49	0.49
query10	0.56	0.55	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.59	0.58
query14	0.95	0.94	0.94
query15	0.79	0.78	0.79
query16	0.40	0.43	0.39
query17	1.04	1.04	1.08
query18	0.23	0.21	0.21
query19	1.95	1.89	1.83
query20	0.02	0.03	0.02
query21	15.47	0.27	0.14
query22	5.21	0.06	0.05
query23	15.90	0.30	0.11
query24	1.04	0.25	0.19
query25	0.09	0.08	0.07
query26	0.15	0.13	0.14
query27	0.06	0.09	0.06
query28	3.12	1.09	0.88
query29	12.57	3.92	3.15
query30	0.28	0.14	0.13
query31	2.82	0.63	0.39
query32	3.26	0.57	0.46
query33	2.95	2.99	3.01
query34	16.17	5.06	4.39
query35	4.44	4.42	4.41
query36	0.67	0.49	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.04	0.03
query40	0.17	0.15	0.15
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 95.74 s
Total hot run time: 26.71 s

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 16, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 88.89% (8/9) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 88.89% (8/9) 🎉
Increment coverage report
Complete coverage report

@morningman morningman merged commit e572788 into apache:master Jan 19, 2026
35 of 36 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_static_partition_write branch February 13, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants