fix: Fix the core correctness issues with windowed aggregate queries#3191
Closed
andygrove wants to merge 18 commits intoapache:mainfrom
Closed
fix: Fix the core correctness issues with windowed aggregate queries#3191andygrove wants to merge 18 commits intoapache:mainfrom
andygrove wants to merge 18 commits intoapache:mainfrom
Conversation
andygrove
commented
Jan 15, 2026
Comment on lines
+1607
to
+1618
| // Ensure input is properly sorted when ORDER BY is present | ||
| // BoundedWindowAggExec requires InputOrderMode::Sorted | ||
| let needs_explicit_sort = !sort_exprs.is_empty(); | ||
| let sorted_child: Arc<dyn ExecutionPlan> = if needs_explicit_sort { | ||
| // Insert explicit sort to ensure data ordering | ||
| Arc::new(SortExec::new( | ||
| LexOrdering::new(sort_exprs.to_vec()).unwrap(), | ||
| Arc::clone(&child.native_plan), | ||
| )) | ||
| } else { | ||
| Arc::clone(&child.native_plan) | ||
| }; |
Member
Author
There was a problem hiding this comment.
This seems to be the main fix
…omet into fix-windowed-aggs
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3191 +/- ##
============================================
+ Coverage 56.12% 60.67% +4.54%
- Complexity 976 1437 +461
============================================
Files 119 170 +51
Lines 11743 15733 +3990
Branches 2251 2596 +345
============================================
+ Hits 6591 9546 +2955
- Misses 4012 4842 +830
- Partials 1140 1345 +205 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Member
Author
|
I need to update the golden files |
- Restore original ignored tests for ROWS BETWEEN with PARTITION BY + ORDER BY
and convert them to use checkSparkAnswerAndFallbackReason to verify the
correct fallback message: "Partition expressions must be a subset of order
expressions for native window"
- Fix documentation to accurately reflect what's supported:
- Remove AVG from supported list (has native implementation issues)
- Clarify PARTITION BY/ORDER BY restriction (partition must be subset of order)
- Clarify ROWS BETWEEN limitations
- Fix misleading test names ("COUNT with ROWS frame" -> "COUNT with PARTITION BY only")
- Convert several ignored tests to active tests that verify fallback behavior:
- ROWS BETWEEN tests (COUNT, SUM, AVG, MAX)
- ORDER BY DESC test
- Multiple PARTITION BY/ORDER BY columns tests
- RANGE BETWEEN tests
Co-Authored-By: Claude Opus 4.5 <[email protected]>
52ab85c to
11b615d
Compare
…c results Window function tests with ROWS BETWEEN frames were failing due to non-deterministic sort order when using ORDER BY with non-unique values. When CometSort encounters rows with identical sort keys, the relative order of those rows is undefined, causing window function results to vary between Spark and Comet executions. Updated all window fallback tests to include column 'c' (which contains unique values) as a tiebreaker in ORDER BY clauses, ensuring deterministic ordering and consistent test results. Tests updated: - ROWS BETWEEN tests (COUNT, SUM, AVG, MAX) - ORDER BY DESC test - Multiple PARTITION BY/ORDER BY columns tests - RANGE BETWEEN tests Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
RANGE window frames with value boundaries cannot have multiple ORDER BY columns in Spark SQL. Reverted the two RANGE BETWEEN tests back to single ORDER BY column. These tests use b=i (unique values) instead of b=i%5, so ordering is already deterministic. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Member
Author
|
@comphead this is ready for another look. I am not 100% happy with this but it does fix some tests and the feature is still disabled by default, so it seems like some progress. Some of the tests got updated so we need to review and make sure that these are valid changes and not ignoring issues. |
# Conflicts: # docs/source/user-guide/latest/configs.md
Member
Author
|
I am going to break this down into some smaller PRs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables a subset of window aggregate functions (COUNT, SUM, MIN, MAX) for native execution in Comet. Window functions were previously disabled with "Native WindowExec has known correctness issues" - this PR addresses the core issues for simple cases.
Tracking Issue: #2721
What's Now Supported
COUNT,SUM,MIN,MAXPARTITION BYonly (noORDER BY)ORDER BYonly (noPARTITION BY)PARTITION BYwithORDER BYwhen partition columns are a subset of order columns(e.g.,
PARTITION BY a ORDER BY a, bworks)Example Supported Queries
What's NOT Yet Supported (Falls Back to Spark)
AVGwindow aggregate (native implementation has known issues)PARTITION BY a ORDER BY bwhere partition columns differ from order columnsROWS BETWEENframes withPARTITION BYandORDER BYon different columnsROW_NUMBER,RANK,DENSE_RANK,PERCENT_RANK,NTILE,CUME_DISTLAG,LEADFIRST_VALUE,LAST_VALUE,NTH_VALUERANGE BETWEENwith numeric/temporal expressions (Invalid argument error: Invalid arithmetic operation: Int32 - Int64 #1246)Changes
1. Change Support Level from Blanket Incompatible
File:
spark/src/main/scala/org/apache/spark/sql/comet/CometWindowExec.scalaChanged
getSupportLevelfrom blanketIncompatible("Native WindowExec has known correctness issues")to:Compatible()when partition expressions are a subset of order expressions (or when one is empty)Unsupported()when DataFusion's constraint is violated2. Remove Overly Restrictive Validation
File:
spark/src/main/scala/org/apache/spark/sql/comet/CometWindowExec.scalaRemoved
validatePartitionAndSortSpecsForWindowFuncwhich required partition columns to exactly match order columns. This was more restrictive than necessary.3. Add Explicit Sort for ORDER BY
File:
native/core/src/execution/planner.rsAdded explicit
SortExecbeforeBoundedWindowAggExecwhen ORDER BY is present to ensure correct input ordering.4. Update Tests
File:
spark/src/test/scala/org/apache/comet/exec/CometWindowExecSuite.scalaignoreto document what still doesn't work5. Update Documentation
File:
docs/source/user-guide/latest/compatibility.mdUpdated Window Functions section to accurately document supported and unsupported functionality.
Related Issues