perf: optimize array comparison with pre-Eval reference equality#732
Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Closed
perf: optimize array comparison with pre-Eval reference equality#732He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
56ebc83 to
45a6c81
Compare
When arrays share element references (e.g., from concatenation of a common base), we can skip forcing entirely for strict Val elements and avoid redundant double-forcing for shared lazy thunks. Three-layer optimization: - Val.Arr.value(i): instanceof fast-path to skip virtual dispatch for strict values, reducing vtable overhead on Scala Native (~4-5ns/call) - compare(): pre-Eval reference equality check before forcing elements, with separate paths for shared strict (skip) and shared lazy (force once) - equal(): mirror the compare() structure for consistent optimization Includes regression test for shared-element comparison semantics. JMH: comparison -2.3% (21.34 -> 20.85 ms/op) Native: comparison -4.7% (30.0ms -> 28.6ms)
45a6c81 to
4a49ee5
Compare
Contributor
Author
|
Closing: benchmark testing confirmed this optimization provides no meaningful improvement. Both JMH (20.783ms vs 20.139ms baseline, slightly worse) and native hyperfine (27.1ms → 26.0ms, ~4% within noise) showed no significant gain for comparison_for_array. The benchmark is allocation-dominated (std.range creates 1M Val.Num objects), and pre-Eval reference equality checks don't address that bottleneck. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When arrays share element references (e.g.,
std.range(1,N)followed bylong_array + [x] < long_array + [y]), bothcompare()andequal()call.valueon everyEvalelement to force lazy thunks. On Scala Native (AOT compilation), each virtual dispatch costs ~4-5ns. For two 1M-element arrays sharing 999,999 references, this amounts to ~10ms of unnecessary vtable overhead.Key Design Decision
Three-layer optimization that preserves error semantics:
Val.Arr.value(i)instanceof fast-path:if (e.isInstanceOf[Val]) e.asInstanceOf[Val] else e.value— avoids vtable dispatch for strict values globallycompare(): Checkex eq eybefore forcing. For shared strictValreferences, skip entirely. For shared lazy thunks, force once to preserve error observation.equal(): Mirror thecompare()structure — separate paths for shared strict (skip) and shared lazy (force once).Modification
Val.scala:Arr.value(i)— addedisInstanceOf[Val]fast-pathEvaluator.scala:compare()array branch — pre-Eval reference equality with lazy error preservationEvaluator.scala:equal()array branch — mirroredcompare()optimization structurearray_compare_shared_elements.jsonnet— regression test covering shared strict/lazy element comparison semanticsBenchmark Results
JMH (JVM, Scala 3.3.7)
Full JMH regression suite (35 benchmarks): no regressions detected.
Hyperfine (Scala Native vs jrsonnet)
Analysis
Val.valuein hot loopsisInstanceOfcheck provides real savingsisInstanceOfoverhead for non-shared elements is negligible (branch prediction + zero-cost on JVM)References
jrsonnet/docs/benchmarks.adocbench/resources/go_suite/comparison.jsonnetResult
Closes the comparison benchmark gap from 2.34x → 2.23x vs jrsonnet. No semantic changes — all 141 test suites pass, including new regression test for shared-element comparison correctness.