Skip to content

[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492

Open
seawinde wants to merge 1 commit intoapache:masterfrom
seawinde:fix-mtmv-2hop-null-reject
Open

[fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite#62492
seawinde wants to merge 1 commit intoapache:masterfrom
seawinde:fix-mtmv-2hop-null-reject

Conversation

@seawinde
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: #30374

Problem Summary:

In multi-hop LEFT JOIN materialized view transparent rewrite (e.g., fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause that null-rejects only the outermost dimension table (e.g., WHERE dim2.col = 'value'), the MV rewrite fails with "Predicate compensate fail".

Root cause: In AbstractMaterializedViewRule.containsNullRejectSlot(), the original code only checked filter predicates (queryPredicates) for NOT NULL evidence. After the Nereids rewrite pipeline runs:

  1. EliminateOuterJoin converts all eligible LEFT JOINs → INNER (cascading through InferJoinNotNull across multiple passes)
  2. EliminateNotNull unconditionally removes all generated NOT NULL predicates (isGeneratedIsNotNull=true)

By the time MV rewrite (exploration phase) runs, the query plan has INNER JOINs but zero NOT NULL filter predicates. The only surviving predicate is the user's WHERE clause (e.g., dim2.region_name = 'West'), which can only prove NOT NULL for outermost dim2 slots — leaving intermediate dim1 slots uncovered.

Fix: Read INNER JoinEdge conditions directly from the query HyperGraph. After EliminateOuterJoin converts LEFT→INNER, JoinEdge objects retain their INNER type and join condition expressions even though EliminateNotNull removes filter-level NOT NULL predicates. ExpressionUtils.inferNotNullSlots() extracts NOT NULL slots from these INNER join conditions, covering all intermediate join tables.

File Change Description
AbstractMaterializedViewRule.java containsNullRejectSlot(): Add loop over INNER JoinEdges to collect NOT NULL slots from join conditions via inferNotNullSlots. Also add shuttleExpressionWithLineage for correct slot-level mapping.
NullRejectInferenceTest.java (new) FE unit test: query=2-hop INNER JOIN vs view=2-hop LEFT JOIN, verifies predicatesCompensate succeeds
outer_join_two_hop_null_reject.groovy (new) Regression test: 3 tables, async MV with 2-hop LEFT JOIN + WHERE + aggregate rollup, verifies rewrite success and result correctness

2-hop example walkthrough:

Query HyperGraph (after EliminateOuterJoin):
  JoinEdge 1 (INNER): o.store_id = d.id    → {o.store_id, d.id} NOT NULL
  JoinEdge 2 (INNER): d.id = r.store_id    → {d.id, r.store_id} NOT NULL
  FilterEdge:         r.region_name = 'West' → {r.region_name} NOT NULL

queryNullRejectSlots = {o.store_id, d.id, r.store_id, r.region_name}

requireNoNullableViewSlot (view has LEFT JOINs):
  Set 1: {d.id, d.store_name} ∩ queryNullRejectSlots → {d.id} ≠ ∅ ✓
  Set 2: {r.store_id, r.region_name} ∩ queryNullRejectSlots → {r.store_id, r.region_name} ≠ ∅ ✓

Release note

Fix multi-hop LEFT JOIN materialized view transparent rewrite failure when the WHERE clause only references the outermost dimension table.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
  • Behavior changed:

    • No.
  • Does this need documentation?

    • No.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@seawinde seawinde changed the title [fix](fe) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite [fix](mtmv) Infer null-reject from INNER JoinEdge for multi-hop outer join MV rewrite Apr 14, 2026
…oin MV rewrite

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: In multi-hop LEFT JOIN MV rewrite (e.g.,
fact LEFT JOIN dim1 LEFT JOIN dim2), when the query has a WHERE clause
that null-rejects the outermost table (dim2), EliminateOuterJoin
converts all LEFT JOINs to INNER. However, containsNullRejectSlot only
checked filter predicates for NOT NULL proof, which only covers the
outermost table slots. The intermediate table (dim1) slots had no
NOT NULL evidence, causing "Predicate compensate fail".

The fix reads INNER JoinEdge conditions from the query HyperGraph.
After EliminateOuterJoin converts LEFT→INNER, JoinEdge objects retain
their INNER type and join condition expressions even though
EliminateNotNull removes filter-level NOT NULL predicates.
ExpressionUtils.inferNotNullSlots extracts NOT NULL slots from these
INNER join conditions, covering all intermediate join tables.

### Release note

Fix multi-hop LEFT JOIN materialized view transparent rewrite failure
when WHERE clause only references the outermost dimension table.

### Check List (For Author)

- Test: Unit Test (NullRejectInferenceTest) / Regression test (outer_join_two_hop_null_reject)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@seawinde seawinde force-pushed the fix-mtmv-2hop-null-reject branch from f2f6c8a to 488c34d Compare April 14, 2026 14:45
@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (20/20) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants