Skip to content

perf: optimize array comparison with pre-Eval reference equality#732

Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/array-compare-opt
Closed

perf: optimize array comparison with pre-Eval reference equality#732
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/array-compare-opt

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 10, 2026

Motivation

When arrays share element references (e.g., std.range(1,N) followed by long_array + [x] < long_array + [y]), both compare() and equal() call .value on every Eval element to force lazy thunks. On Scala Native (AOT compilation), each virtual dispatch costs ~4-5ns. For two 1M-element arrays sharing 999,999 references, this amounts to ~10ms of unnecessary vtable overhead.

Key Design Decision

Three-layer optimization that preserves error semantics:

  1. Val.Arr.value(i) instanceof fast-path: if (e.isInstanceOf[Val]) e.asInstanceOf[Val] else e.value — avoids vtable dispatch for strict values globally
  2. Pre-Eval reference equality in compare(): Check ex eq ey before forcing. For shared strict Val references, skip entirely. For shared lazy thunks, force once to preserve error observation.
  3. Pre-Eval reference equality in equal(): Mirror the compare() structure — separate paths for shared strict (skip) and shared lazy (force once).

Modification

  • Val.scala: Arr.value(i) — added isInstanceOf[Val] fast-path
  • Evaluator.scala: compare() array branch — pre-Eval reference equality with lazy error preservation
  • Evaluator.scala: equal() array branch — mirrored compare() optimization structure
  • New test: array_compare_shared_elements.jsonnet — regression test covering shared strict/lazy element comparison semantics

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark Before (ms/op) After (ms/op) Change
comparison 21.344 20.848 -2.3%
comparison2 ~35.9 ~35.8 neutral
realistic2 60.993 61.0 neutral
reverse 8.464 8.5 neutral

Full JMH regression suite (35 benchmarks): no regressions detected.

Hyperfine (Scala Native vs jrsonnet)

Benchmark Before (ms) After (ms) Change jrsonnet (ms) Gap
comparison 30.0 28.6 -4.7% 12.8 2.23x
comparison2 84.9 84.9 neutral 222.4 WIN 2.62x
realistic2 255.2 255.0 neutral ~101 2.52x
reverse 36.6 36.6 neutral ~22.6 1.62x

Analysis

  • JVM improvement is modest (-2.3%) because the JIT already devirtualizes Val.value in hot loops
  • Native improvement is larger (-4.7%) because AOT compilation cannot eliminate virtual dispatch — the isInstanceOf check provides real savings
  • Theoretical limit: For 1M shared elements at 4GHz, saving two vtable dispatches per element ≈ 1.2ms, consistent with measured 1.4ms improvement
  • No regressions across the full benchmark suite — the isInstanceOf overhead for non-shared elements is negligible (branch prediction + zero-cost on JVM)

References

  • Benchmark suite: jrsonnet/docs/benchmarks.adoc
  • Comparison benchmark: bench/resources/go_suite/comparison.jsonnet

Result

Closes the comparison benchmark gap from 2.34x → 2.23x vs jrsonnet. No semantic changes — all 141 test suites pass, including new regression test for shared-element comparison correctness.

@He-Pin He-Pin marked this pull request as ready for review April 10, 2026 19:52
@He-Pin He-Pin closed this Apr 10, 2026
@He-Pin He-Pin reopened this Apr 10, 2026
@He-Pin He-Pin marked this pull request as draft April 10, 2026 20:16
@He-Pin He-Pin force-pushed the perf/array-compare-opt branch 3 times, most recently from 56ebc83 to 45a6c81 Compare April 10, 2026 23:17
When arrays share element references (e.g., from concatenation of a
common base), we can skip forcing entirely for strict Val elements
and avoid redundant double-forcing for shared lazy thunks.

Three-layer optimization:
- Val.Arr.value(i): instanceof fast-path to skip virtual dispatch for
  strict values, reducing vtable overhead on Scala Native (~4-5ns/call)
- compare(): pre-Eval reference equality check before forcing elements,
  with separate paths for shared strict (skip) and shared lazy (force once)
- equal(): mirror the compare() structure for consistent optimization

Includes regression test for shared-element comparison semantics.

JMH: comparison -2.3% (21.34 -> 20.85 ms/op)
Native: comparison -4.7% (30.0ms -> 28.6ms)
@He-Pin He-Pin force-pushed the perf/array-compare-opt branch from 45a6c81 to 4a49ee5 Compare April 10, 2026 23:26
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 11, 2026

Closing: benchmark testing confirmed this optimization provides no meaningful improvement. Both JMH (20.783ms vs 20.139ms baseline, slightly worse) and native hyperfine (27.1ms → 26.0ms, ~4% within noise) showed no significant gain for comparison_for_array. The benchmark is allocation-dominated (std.range creates 1M Val.Num objects), and pre-Eval reference equality checks don't address that bottleneck.

@He-Pin He-Pin closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant