perf: optimize array comparison with pre-Eval reference equality by He-Pin · Pull Request #732 · databricks/sjsonnet

He-Pin · 2026-04-10T17:37:05Z

Motivation

When arrays share element references (e.g., std.range(1,N) followed by long_array + [x] < long_array + [y]), both compare() and equal() call .value on every Eval element to force lazy thunks. On Scala Native (AOT compilation), each virtual dispatch costs ~4-5ns. For two 1M-element arrays sharing 999,999 references, this amounts to ~10ms of unnecessary vtable overhead.

Key Design Decision

Three-layer optimization that preserves error semantics:

Val.Arr.value(i) instanceof fast-path: if (e.isInstanceOf[Val]) e.asInstanceOf[Val] else e.value — avoids vtable dispatch for strict values globally
Pre-Eval reference equality in compare(): Check ex eq ey before forcing. For shared strict Val references, skip entirely. For shared lazy thunks, force once to preserve error observation.
Pre-Eval reference equality in equal(): Mirror the compare() structure — separate paths for shared strict (skip) and shared lazy (force once).

Modification

Val.scala: Arr.value(i) — added isInstanceOf[Val] fast-path
Evaluator.scala: compare() array branch — pre-Eval reference equality with lazy error preservation
Evaluator.scala: equal() array branch — mirrored compare() optimization structure
New test: array_compare_shared_elements.jsonnet — regression test covering shared strict/lazy element comparison semantics

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark	Before (ms/op)	After (ms/op)	Change
comparison	21.344	20.848	-2.3%
comparison2	~35.9	~35.8	neutral
realistic2	60.993	61.0	neutral
reverse	8.464	8.5	neutral

Full JMH regression suite (35 benchmarks): no regressions detected.

Hyperfine (Scala Native vs jrsonnet)

Benchmark	Before (ms)	After (ms)	Change	jrsonnet (ms)	Gap
comparison	30.0	28.6	-4.7%	12.8	2.23x
comparison2	84.9	84.9	neutral	222.4	WIN 2.62x
realistic2	255.2	255.0	neutral	~101	2.52x
reverse	36.6	36.6	neutral	~22.6	1.62x

Analysis

JVM improvement is modest (-2.3%) because the JIT already devirtualizes Val.value in hot loops
Native improvement is larger (-4.7%) because AOT compilation cannot eliminate virtual dispatch — the isInstanceOf check provides real savings
Theoretical limit: For 1M shared elements at 4GHz, saving two vtable dispatches per element ≈ 1.2ms, consistent with measured 1.4ms improvement
No regressions across the full benchmark suite — the isInstanceOf overhead for non-shared elements is negligible (branch prediction + zero-cost on JVM)

References

Benchmark suite: jrsonnet/docs/benchmarks.adoc
Comparison benchmark: bench/resources/go_suite/comparison.jsonnet

Result

Closes the comparison benchmark gap from 2.34x → 2.23x vs jrsonnet. No semantic changes — all 141 test suites pass, including new regression test for shared-element comparison correctness.

When arrays share element references (e.g., from concatenation of a common base), we can skip forcing entirely for strict Val elements and avoid redundant double-forcing for shared lazy thunks. Three-layer optimization: - Val.Arr.value(i): instanceof fast-path to skip virtual dispatch for strict values, reducing vtable overhead on Scala Native (~4-5ns/call) - compare(): pre-Eval reference equality check before forcing elements, with separate paths for shared strict (skip) and shared lazy (force once) - equal(): mirror the compare() structure for consistent optimization Includes regression test for shared-element comparison semantics. JMH: comparison -2.3% (21.34 -> 20.85 ms/op) Native: comparison -4.7% (30.0ms -> 28.6ms)

He-Pin · 2026-04-11T08:31:39Z

Closing: benchmark testing confirmed this optimization provides no meaningful improvement. Both JMH (20.783ms vs 20.139ms baseline, slightly worse) and native hyperfine (27.1ms → 26.0ms, ~4% within noise) showed no significant gain for comparison_for_array. The benchmark is allocation-dominated (std.range creates 1M Val.Num objects), and pre-Eval reference equality checks don't address that bottleneck.

He-Pin marked this pull request as ready for review April 10, 2026 19:52

He-Pin closed this Apr 10, 2026

He-Pin reopened this Apr 10, 2026

He-Pin marked this pull request as draft April 10, 2026 20:16

He-Pin force-pushed the perf/array-compare-opt branch 3 times, most recently from 56ebc83 to 45a6c81 Compare April 10, 2026 23:17

He-Pin force-pushed the perf/array-compare-opt branch from 45a6c81 to 4a49ee5 Compare April 10, 2026 23:26

He-Pin closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize array comparison with pre-Eval reference equality#732

perf: optimize array comparison with pre-Eval reference equality#732
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/array-compare-opt

He-Pin commented Apr 10, 2026

Uh oh!

He-Pin commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 10, 2026

Motivation

Key Design Decision

Modification

Benchmark Results

JMH (JVM, Scala 3.3.7)

Hyperfine (Scala Native vs jrsonnet)

Analysis

References

Result

Uh oh!

He-Pin commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant