-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[DO NOT MERGE JUST TEST](compaction) sparse wide table compaction optimization #59967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Yukang-Lian
wants to merge
7
commits into
apache:master
Choose a base branch
from
Yukang-Lian:feature/sparse-compaction-v2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[DO NOT MERGE JUST TEST](compaction) sparse wide table compaction optimization #59967
Yukang-Lian
wants to merge
7
commits into
apache:master
from
Yukang-Lian:feature/sparse-compaction-v2
+2,086
−10
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit optimizes the compaction for sparse wide tables with: 1. Column batch replacement interface: - replace_column_data_range(): batch copy using memcpy for fixed-width types - support_replace_column_data_range(): runtime type check 2. Iterator batch processing: - RowBatch: represents continuous rows from same segment - unique_key_next_batch(): returns multiple batches instead of row-by-row 3. Reader sparse optimization: - Pre-fill NULL for nullable columns - SIMD detection for all-NULL/all-non-NULL batches - Run-length processing for mixed NULL batches 4. Writer SIMD optimization: - Use simd::count_zero_num for fast NULL counting - Fast path for all-NULL and all-non-NULL cases
Add batch Put operation for RLE encoder: - Put(value, run_length): write same value multiple times efficiently - Fast path when already in repeated run mode - Reduces loop iterations from N to 1 for repeated values
1. Unit tests (vertical_merge_iterator_test.cpp): - SparseColumnOptimizationTest::AllNullColumn - SparseColumnOptimizationTest::AllNonNullColumn - SparseColumnOptimizationTest::SparseMixedColumn - SparseColumnOptimizationTest::DenseMixedColumn - SparseColumnOptimizationTest::ReplaceColumnDataRange - SparseColumnOptimizationTest::CountZeroNumSIMD 2. Regression test (test_compaction_sparse_wide_table.groovy): - Create 500-column wide table - Insert sparse data (most columns are NULL) - Trigger compaction - Verify data correctness
…tion Add sparsity estimation based on rowset metadata: - Calculate data_ratio = actual_data_size / theoretical_full_size - actual_data_size: sum of all rowsets' data_disk_size() - theoretical_full_size: total_rows * tablet_schema.row_size() - Enable sparse optimization when data_ratio <= threshold (default 0.1) - Lower ratio means more sparse data (more NULLs) Config options: - enable_sparse_column_compaction_optimization: master switch - sparse_column_compaction_threshold: ratio threshold (default 0.1)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Collaborator
Author
|
run buildall |
TPC-H: Total hot run time: 31886 ms |
TPC-DS: Total hot run time: 173828 ms |
ClickBench: Total hot run time: 26.62 s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)