[regression](load) add some p0 case for load strict mode#62478
[regression](load) add some p0 case for load strict mode#62478jacktengg wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
review |
|
/review |
There was a problem hiding this comment.
Finding:
regression-test/suites/load_p0/insert/test_insert_strict_mode_and_filter_ratio.groovy: the new strict-modeinsert into selectcase forNULL -> NOT NULLis asserting the wrong failure contract. In FE, successfulinsert into selectjobs callAbstractInsertExecutor.checkStrictModeAndFilterRatio()after execution, and that path raisesInsert has filtered data in strict modewheneverenable_insert_strict=trueandfilteredRows > 0(fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/AbstractInsertExecutor.java:220-225). The paired non-strict case in this same test (insert_max_filter_ratio=0.2) already proves this scenario is counted as filtered rows rather than failing earlier with an executor error, so this strict assertion would lock in behavior that contradicts the current implementation.
Critical checkpoint conclusions:
- Goal of the task: add regression coverage for strict mode and filter-ratio behavior in stream load and
insert into select. The PR mostly does that, but the strictNULL -> NOT NULLinsert into selectassertion does not match the current code path, so the goal is not fully met yet. - Modification size and focus: yes, the PR is focused on regression coverage only.
- Concurrency: not applicable; these changes are test-only and do not introduce concurrent code.
- Lifecycle, static init, config, compatibility: not applicable; no new lifecycle, config, protocol, or storage-format changes.
- Parallel code paths: checked against existing
VALUES, TVF, and FE insert-executor behavior. Oneinsert into selectbranch is inconsistent with the FE path that governs filtered-row handling. - Special conditions: the strict-mode and error-message distinction depends on whether the failure happens during execution or during post-exec filtered-row validation. This case is in the latter category.
- Test coverage: coverage is expanded well overall, but this blocking assertion should be corrected before relying on the suite.
- Test result files: the new
.outupdates are consistent with the new cases and appear auto-generated. - Observability, transaction, persistence, data write safety, performance: not applicable for this regression-only PR.
- Other issues: none beyond the blocking assertion above.
| sql """ | ||
| insert into test_insert_select_strict_mode_and_filter_ratio_dst select * from test_insert_select_strict_mode_and_filter_ratio_src; | ||
| """ | ||
| exception """Insert has too many filtered data""" |
There was a problem hiding this comment.
This assertion looks inconsistent with the current insert into select implementation. For this branch, the non-strict sibling just above already shows the job completes and reports filtered rows, so the strict-mode failure should come from AbstractInsertExecutor.checkStrictModeAndFilterRatio() in FE. That code path raises Insert has filtered data in strict mode, not Insert has too many filtered data, and it does not append a url check either.
| exception """Insert has too many filtered data""" | |
| exception """Insert has filtered data in strict mode""" |
|
/review |
There was a problem hiding this comment.
No blocking findings from code review.
Critical checkpoint conclusions:
- Goal / correctness: The PR expands regression coverage for load strict-mode and filter-ratio behavior, especially
INSERT INTO SELECTcases ininsert, tinyint overflow instream_load, and expected error-message alignment intvf. The added/updated suites and.outfiles are consistent with that goal. I did not see a contradictory expectation in the changed files. - Scope / focus: The modification is small and focused to regression assets only (
regression-test/suites,.out, and one CSV input). No unrelated production-code churn is mixed in. - Concurrency: Not applicable; these are regression scripts and golden outputs only.
- Lifecycle / static init: Not applicable.
- Configuration: No new config items are added. The changes only exercise existing session variables and load properties.
- Compatibility: No incompatible API, storage-format, or protocol changes are introduced.
- Parallel code paths: The PR touches the relevant load-path regression suites in scope (
insert,stream_load,tvf). Within this scope I did not find an obvious missing companion assertion that would make coverage inconsistent. - Special conditions: The tests intentionally cover strict-mode and filter-ratio thresholds with explicit expected failures/successes; the conditions are understandable from the case names and comments.
- Test coverage: Coverage is materially improved for strict-mode/filter-ratio combinations. For this PR type, regression tests are the right level. I could not verify execution in this runner.
- Modified test results: The updated
.outfiles match the new query tags added in the suites. I did not find a mismatched or missing golden-result section in the modified files. - Observability: No observability changes are needed for this test-only PR.
- Transaction / persistence / data-write correctness: No production logic changed, so there is no direct risk in FE/BE transaction or persistence behavior from this patch itself.
- Performance: No production hot path changed.
- Other issues: None identified during review.
Residual risk: since I did not execute the regression suite here, the remaining risk is limited to environment-dependent result drift rather than a clear logic problem in the patch itself.
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)