Skip to content

Conversation

@Yukang-Lian
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

This commit optimizes the compaction for sparse wide tables with:

1. Column batch replacement interface:
   - replace_column_data_range(): batch copy using memcpy for fixed-width types
   - support_replace_column_data_range(): runtime type check

2. Iterator batch processing:
   - RowBatch: represents continuous rows from same segment
   - unique_key_next_batch(): returns multiple batches instead of row-by-row

3. Reader sparse optimization:
   - Pre-fill NULL for nullable columns
   - SIMD detection for all-NULL/all-non-NULL batches
   - Run-length processing for mixed NULL batches

4. Writer SIMD optimization:
   - Use simd::count_zero_num for fast NULL counting
   - Fast path for all-NULL and all-non-NULL cases
Add batch Put operation for RLE encoder:
- Put(value, run_length): write same value multiple times efficiently
- Fast path when already in repeated run mode
- Reduces loop iterations from N to 1 for repeated values
1. Unit tests (vertical_merge_iterator_test.cpp):
   - SparseColumnOptimizationTest::AllNullColumn
   - SparseColumnOptimizationTest::AllNonNullColumn
   - SparseColumnOptimizationTest::SparseMixedColumn
   - SparseColumnOptimizationTest::DenseMixedColumn
   - SparseColumnOptimizationTest::ReplaceColumnDataRange
   - SparseColumnOptimizationTest::CountZeroNumSIMD

2. Regression test (test_compaction_sparse_wide_table.groovy):
   - Create 500-column wide table
   - Insert sparse data (most columns are NULL)
   - Trigger compaction
   - Verify data correctness
…tion

Add sparsity estimation based on rowset metadata:
- Calculate data_ratio = actual_data_size / theoretical_full_size
- actual_data_size: sum of all rowsets' data_disk_size()
- theoretical_full_size: total_rows * tablet_schema.row_size()
- Enable sparse optimization when data_ratio <= threshold (default 0.1)
- Lower ratio means more sparse data (more NULLs)

Config options:
- enable_sparse_column_compaction_optimization: master switch
- sparse_column_compaction_threshold: ratio threshold (default 0.1)
@Thearas
Copy link
Contributor

Thearas commented Jan 16, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31886 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e30ca577d0a6379d4b02eff89cbfdd14f34f32a9, data reload: false

------ Round 1 ----------------------------------
q1	17618	4152	4003	4003
q2	2074	356	228	228
q3	10115	1289	727	727
q4	10202	814	307	307
q5	7525	2098	1843	1843
q6	187	169	140	140
q7	925	825	658	658
q8	9257	1414	1139	1139
q9	4920	4611	4596	4596
q10	6840	1818	1394	1394
q11	509	302	295	295
q12	746	763	595	595
q13	17857	3885	3082	3082
q14	295	296	278	278
q15	585	521	508	508
q16	708	679	636	636
q17	676	817	502	502
q18	6766	6461	7031	6461
q19	1220	1030	685	685
q20	420	415	262	262
q21	3326	2657	2517	2517
q22	1147	1080	1030	1030
Total cold run time: 103918 ms
Total hot run time: 31886 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4337	4315	4279	4279
q2	326	406	342	342
q3	2290	2817	2432	2432
q4	1440	1904	1498	1498
q5	4361	4310	4418	4310
q6	220	168	136	136
q7	2058	1911	1818	1818
q8	2575	2576	2399	2399
q9	7210	7206	7140	7140
q10	2563	2771	2321	2321
q11	527	478	461	461
q12	682	762	585	585
q13	3674	3977	3070	3070
q14	275	271	251	251
q15	530	486	485	485
q16	628	669	609	609
q17	1092	1268	1323	1268
q18	7497	7152	7447	7152
q19	821	821	784	784
q20	1868	1939	1776	1776
q21	4509	4265	4074	4074
q22	1074	1050	1028	1028
Total cold run time: 50557 ms
Total hot run time: 48218 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173828 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e30ca577d0a6379d4b02eff89cbfdd14f34f32a9, data reload: false

query5	4371	620	481	481
query6	344	233	206	206
query7	4241	460	255	255
query8	362	251	235	235
query9	8740	2909	2871	2871
query10	496	399	333	333
query11	15293	15242	15001	15001
query12	179	117	114	114
query13	1248	487	386	386
query14	5896	3055	2788	2788
query14_1	2679	2671	2652	2652
query15	200	192	173	173
query16	999	499	464	464
query17	1094	682	570	570
query18	2442	438	336	336
query19	220	235	189	189
query20	120	119	120	119
query21	214	135	119	119
query22	4021	3956	4033	3956
query23	16158	15672	15216	15216
query23_1	15400	15542	15336	15336
query24	7183	1530	1171	1171
query24_1	1169	1163	1191	1163
query25	542	466	421	421
query26	1244	268	149	149
query27	2775	438	283	283
query28	4533	2193	2186	2186
query29	773	546	428	428
query30	313	241	210	210
query31	821	649	578	578
query32	87	74	73	73
query33	531	354	327	327
query34	910	882	528	528
query35	755	792	686	686
query36	891	911	790	790
query37	131	97	87	87
query38	2715	2731	2673	2673
query39	786	769	738	738
query39_1	713	709	709	709
query40	225	140	122	122
query41	73	68	68	68
query42	104	101	106	101
query43	453	438	424	424
query44	1317	803	753	753
query45	184	186	173	173
query46	831	953	560	560
query47	1458	1434	1298	1298
query48	320	318	249	249
query49	620	419	335	335
query50	618	263	201	201
query51	3813	3795	3853	3795
query52	104	109	93	93
query53	290	320	268	268
query54	296	272	266	266
query55	85	79	75	75
query56	318	311	304	304
query57	1011	1036	880	880
query58	262	261	260	260
query59	2151	2204	1962	1962
query60	326	333	308	308
query61	144	145	151	145
query62	412	353	313	313
query63	298	268	267	267
query64	4928	1242	928	928
query65	3866	3683	3816	3683
query66	1459	426	328	328
query67	15678	15662	15485	15485
query68	2397	1105	778	778
query69	449	351	320	320
query70	994	973	865	865
query71	324	303	282	282
query72	5268	3187	3222	3187
query73	586	721	312	312
query74	8720	8743	8613	8613
query75	2729	2782	2445	2445
query76	2269	1056	674	674
query77	360	379	305	305
query78	9856	9981	9232	9232
query79	1983	908	579	579
query80	1502	570	484	484
query81	567	264	239	239
query82	1001	148	117	117
query83	368	257	234	234
query84	251	116	95	95
query85	905	483	419	419
query86	421	294	297	294
query87	2864	2853	2810	2810
query88	3485	2594	2565	2565
query89	385	351	322	322
query90	1945	164	157	157
query91	170	154	132	132
query92	80	69	70	69
query93	1119	914	552	552
query94	766	309	287	287
query95	601	349	320	320
query96	639	502	231	231
query97	2358	2393	2326	2326
query98	220	202	195	195
query99	589	563	525	525
Total cold run time: 248669 ms
Total hot run time: 173828 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e30ca577d0a6379d4b02eff89cbfdd14f34f32a9, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.04	0.04
query3	0.26	0.09	0.08
query4	1.61	0.12	0.11
query5	0.29	0.25	0.25
query6	1.15	0.67	0.65
query7	0.03	0.03	0.02
query8	0.06	0.04	0.04
query9	0.56	0.50	0.50
query10	0.55	0.55	0.57
query11	0.15	0.10	0.10
query12	0.15	0.11	0.11
query13	0.61	0.58	0.60
query14	0.96	0.93	0.96
query15	0.80	0.79	0.78
query16	0.40	0.40	0.41
query17	1.09	1.06	1.05
query18	0.22	0.21	0.21
query19	1.82	1.85	1.91
query20	0.02	0.01	0.02
query21	15.44	0.27	0.14
query22	5.21	0.05	0.04
query23	16.10	0.28	0.10
query24	1.43	0.57	0.17
query25	0.07	0.07	0.06
query26	0.13	0.13	0.13
query27	0.05	0.06	0.07
query28	4.15	1.08	0.88
query29	12.51	3.88	3.07
query30	0.28	0.15	0.13
query31	2.83	0.61	0.40
query32	3.24	0.55	0.45
query33	3.00	3.08	2.99
query34	15.93	5.02	4.44
query35	4.40	4.40	4.46
query36	0.64	0.50	0.49
query37	0.12	0.07	0.07
query38	0.07	0.04	0.03
query39	0.05	0.04	0.03
query40	0.18	0.14	0.13
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 96.89 s
Total hot run time: 26.62 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants