Skip to content

Conversation

@andygrove
Copy link
Member

Summary

  • Adds native Comet support for Spark's array_position function
  • Returns the 1-based position of an element in an array, or 0 if not found

Implementation Details

This required a custom Rust implementation because DataFusion's array_position returns UInt64 and null when not found, while Spark returns Int64 (LongType) and 0.

Key implementation details:

  • Returns Int64 to match Spark's LongType
  • Returns 0 when element is not found (Spark behavior)
  • Returns null when array is null or search element is null
  • Supports both List and LargeList array types

Test Plan

  • Added unit tests for array_position in CometArrayExpressionSuite
  • Tests cover:
    • Finding elements in integer arrays
    • Finding elements in string arrays
    • Element not found (returns 0)
    • Arrays with null elements
    • Column-based queries (not just literals)
  • All existing tests pass

Note: This PR was generated with AI assistance.

Closes #3153

Implements Spark's array_position function which returns the 1-based
position of an element in an array, returning 0 if not found.

This required a custom Rust implementation because DataFusion's
array_position returns UInt64 and null when not found, while Spark
returns Int64 (LongType) and 0.

Key implementation details:
- Returns Int64 to match Spark's LongType
- Returns 0 when element is not found (Spark behavior)
- Returns null when array is null or search element is null
- Supports both List and LargeList array types

Closes apache#3153

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@andygrove andygrove marked this pull request as draft January 15, 2026 02:28
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.88%. Comparing base (f09f8af) to head (8cb27ec).
⚠️ Report is 849 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/scala/org/apache/comet/serde/arrays.scala 75.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3172      +/-   ##
============================================
+ Coverage     56.12%   59.88%   +3.75%     
- Complexity      976     1414     +438     
============================================
  Files           119      168      +49     
  Lines         11743    15598    +3855     
  Branches       2251     2591     +340     
============================================
+ Hits           6591     9341    +2750     
- Misses         4012     4948     +936     
- Partials       1140     1309     +169     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review January 15, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: length_of_json_array

2 participants