perf: refactor sum int with specialized implementations for each eval_mode #3054

andygrove · 2026-01-08T17:36:36Z

Which issue does this PR close?

Closes #.

Rationale for this change

The current implementation of accumulators for sum_int handle all three eval modes, making the code complex to understand and also adds overhead due to matching on eval mode within loops.

What changes are included in this PR?

Refactored sum_int.rs from a single accumulator with eval mode branches inside loops into 6 separate accumulator implementations:

Single-row accumulators:

SumIntegerAccumulatorLegacy - uses wrapping arithmetic (no overflow checking)
SumIntegerAccumulatorAnsi - uses checked arithmetic, returns error on overflow
SumIntegerAccumulatorTry - uses checked arithmetic, returns None on overflow, tracks has_all_nulls

Groups accumulators:

SumIntGroupsAccumulatorLegacy
SumIntGroupsAccumulatorAnsi
SumIntGroupsAccumulatorTry

Key changes:

SumInteger::accumulator() and create_groups_accumulator() now use a match on eval_mode to instantiate the appropriate accumulator type
Each accumulator has a dedicated inner update_sum function without any eval mode branching
The Legacy/Ansi accumulators don't carry the has_all_nulls field since they don't need it
Added helper methods like overflowed() and group_overflowed() for the Try variants to improve readability

How are these changes tested?

Existing tests.

Benchmark results show some improvement for group accumulators but not for row accumulators.

sum_integer_accumulator/groups_legacy
                        time:   [53.969 µs 54.108 µs 54.283 µs]
                        change: [−14.477% −12.994% −11.194%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
sum_integer_accumulator/groups_ansi
                        time:   [59.963 µs 60.076 µs 60.189 µs]
                        change: [−30.722% −30.522% −30.325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
Benchmarking sum_integer_accumulator/groups_try: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 500.0ms. You may wish to increase target time to 520.5ms, enable flat sampling, or reduce sample count to 60.
sum_integer_accumulator/groups_try
                        time:   [100.99 µs 101.23 µs 101.45 µs]
                        change: [−7.5070% −6.2309% −5.2479%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

codecov-commenter · 2026-01-08T18:18:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.99%. Comparing base (f09f8af) to head (e8a3a50).
⚠️ Report is 918 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3054      +/-   ##
============================================
+ Coverage     56.12%   59.99%   +3.86%     
- Complexity      976     1475     +499     
============================================
  Files           119      175      +56     
  Lines         11743    16165    +4422     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9698    +3107     
- Misses         4012     5114    +1102     
- Partials       1140     1353     +213

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2026-01-15T14:55:56Z

@coderfender could you review?

coderfender

Reviewed the individual accumulators and code LGTM . I will continue to review grups accumulator and provide an update soon

coderfender

LGTM

andygrove · 2026-01-26T20:17:19Z

@comphead could you review when you have time

parthchandra · 2026-02-03T01:09:29Z

This approach creates a lot of code duplication, the difference being mostly in overflow handling. How about we have a trait say, OverflowStrategy, and then implementations of the overflow strategy for each eval mode. Then the implementation can just use the appropriate implementation while most of the code remains the same.

andygrove · 2026-02-03T15:20:48Z

This approach creates a lot of code duplication, the difference being mostly in overflow handling. How about we have a trait say, OverflowStrategy, and then implementations of the overflow strategy for each eval mode. Then the implementation can just use the appropriate implementation while most of the code remains the same.

If I understand correctly, you're suggesting a common implementation that invokes a dyn trait per row for overflow handling?

coderfender · 2026-02-04T01:30:03Z

minor : I wonder if we could reuse Add in checked_arithmetic.rs like Spark to remove redundancy

parthchandra · 2026-02-04T20:45:57Z

This approach creates a lot of code duplication, the difference being mostly in overflow handling. How about we have a trait say, OverflowStrategy, and then implementations of the overflow strategy for each eval mode. Then the implementation can just use the appropriate implementation while most of the code remains the same.

If I understand correctly, you're suggesting a common implementation that invokes a dyn trait per row for overflow handling?

Good point. I also considered a macro based approach but that just makes the code very unreadable. I guess this approach should be okay.

andygrove added 3 commits January 8, 2026 10:35

refactor sum int with specialized implementations for each eval_mode

4350813

remove comments separating implementations

a963938

improve benchmarks

26b852d

andygrove marked this pull request as ready for review January 8, 2026 20:11

andygrove changed the title ~~perf: refactor sum int with specialized implementations for each eval_mode [WIP]~~ perf: refactor sum int with specialized implementations for each eval_mode Jan 8, 2026

coderfender reviewed Jan 16, 2026

View reviewed changes

coderfender approved these changes Jan 25, 2026

View reviewed changes

Merge remote-tracking branch 'apache/main' into sum-int-refactor

e8a3a50

andygrove requested review from comphead, mbutrovich and parthchandra February 2, 2026 23:26

parthchandra approved these changes Feb 3, 2026

View reviewed changes

andygrove merged commit fd47751 into apache:main Feb 4, 2026
124 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: refactor sum int with specialized implementations for each eval_mode #3054

perf: refactor sum int with specialized implementations for each eval_mode #3054

Uh oh!

andygrove commented Jan 8, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 8, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 15, 2026

Uh oh!

coderfender left a comment

Uh oh!

coderfender left a comment

Uh oh!

andygrove commented Jan 26, 2026

Uh oh!

parthchandra commented Feb 3, 2026

Uh oh!

andygrove commented Feb 3, 2026

Uh oh!

coderfender commented Feb 4, 2026

Uh oh!

Uh oh!

parthchandra commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf: refactor sum int with specialized implementations for each eval_mode #3054

perf: refactor sum int with specialized implementations for each eval_mode #3054

Uh oh!

Conversation

andygrove commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 15, 2026

Uh oh!

coderfender left a comment

Choose a reason for hiding this comment

Uh oh!

coderfender left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Jan 26, 2026

Uh oh!

parthchandra commented Feb 3, 2026

Uh oh!

andygrove commented Feb 3, 2026

Uh oh!

coderfender commented Feb 4, 2026

Uh oh!

Uh oh!

parthchandra commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andygrove commented Jan 8, 2026 •

edited

Loading

codecov-commenter commented Jan 8, 2026 •

edited

Loading