Skip to content

perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects#736

Open
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/inline-object-materializer
Open

perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects#736
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/inline-object-materializer

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 10, 2026

Motivation

realistic2.jsonnet creates ~62,500 objects via array comprehensions, each with ~5 fields. The generic materializeRecursiveObj code path was unnecessarily expensive for these objects because it handles inheritance chains, visibility checks, and hash-map lookups — none of which apply to inline objects that have no super chain.

Additionally, when thousands of objects are created from the same MemberList expression (array comprehension body), each object was independently computing its sorted field order during materialization.

Finally, for objects whose fields never reference self, super, or $, the field-value pre-caching during materialization is provably unnecessary — yet we were still allocating HashMaps for these objects.

Key Design Decisions

  1. Inline object fast path — Objects with canDirectIterate=true (no super chain) use materializeSortedInlineObj / materializeInlineObj which directly iterate the inline members array, skipping hash lookups, visibility checks, and allMembers computation.

  2. Sorted key cache — The sorted field order (Array[Int] mapping sorted index → member index) is computed once on first object creation from a MemberList and cached as a @volatile field on the AST node. All subsequent objects from the same expression reuse the cached order. For realistic2 this eliminates 62,499 redundant sort operations.

  3. Skip field cache for no-self-ref objects — A two-mode AST scanner (hasSelfRefExpr) detects whether an object's fields, binds, and asserts reference self, super, or $. When they don't, the materializer skips cacheFieldValue calls entirely, eliminating ~125K HashMap allocations on realistic2.

    • At current scope: Self, Super, $, SelectSuper, InSuper, LookupSuper → has ref
    • Inside nested objects: only $ propagates (self/super bind to inner object)
    • Scans field rhs, dynamic field names, method args, binds, asserts, and function parameter defaults
    • Result cached on MemberList._noSelfRef (volatile, benign-race safe)
  4. Propagation to Val.Obj — Each Val.Obj stores _sortedInlineOrder and _skipFieldCache flags set at construction time in visitMemberList.

Modification

  • Expr.scala: Added _cachedSortedOrder: Array[Int] and _noSelfRef: java.lang.Boolean volatile fields to ObjBody.MemberList
  • Val.scala: Added _sortedInlineOrder: Array[Int] and _skipFieldCache: Boolean fields to Val.Obj
  • Evaluator.scala: In visitMemberList, compute sorted order and no-self-ref flag on first call, cache on MemberList, assign to each Val.Obj
  • Materializer.scala:
    • Added computeSortedInlineOrder() helper
    • Added computeNoSelfRef() + hasSelfRefInMemberList() + hasSelfRefExpr() — complete two-mode scanner
    • Refactored materializeSortedInlineObj to use cached order
    • Guarded 4 cacheFieldValue call sites with if (!obj._skipFieldCache)

Benchmark Results

JMH Regression Suite (JVM, -f1 -wi 1 -i 1)

Benchmark Master (ms/op) This PR (ms/op) Change
realistic2 63.451 49.496 -22.0%
realistic1 1.969 2.032 neutral
bench.02 (foldl) 33.596 35.175 neutral
bench.03 9.628 10.466 neutral
gen_big_object 0.928 0.934 neutral
comparison 23.759 22.777 neutral
comparison2 39.282 37.927 neutral
reverse 8.480 8.749 neutral
base64DecodeBytes 7.620 7.616 neutral
All others no regression

Scala Native (hyperfine --warmup 3 --runs 10)

Binary realistic_2 (ms) vs jrsonnet
Master (147da82) 263.7 ± 4.1 2.58x slower
This PR 210.6 ± 2.8 2.06x slower
jrsonnet (latest) 102.4 ± 3.3 1.00x

Native improvement: -20.1% (263.7ms → 210.6ms)

Scala Native — No regressions on other benchmarks

Benchmark Master (ms) This PR (ms) jrsonnet (ms)
realistic_1 12.1 12.1 13.9
big_object 11.7 11.6 13.3

Analysis

The combined optimization is effective because:

  1. Sorted key cache: Array comprehensions create thousands of objects from the same AST MemberList node. The sort order is identical for all — caching eliminates 62K redundant sort operations.

  2. Skip field cache: When fields don't reference self/super/$, no code path during materialization calls obj.value() on the current object, making cache pre-population wasted work. The two-mode scanner correctly distinguishes self/super at current scope (which binds to the object being materialized) from self/super inside nested objects (which binds to the inner object).

  3. Allocation reduction matters on Native: GC pressure is the main bottleneck on Scala Native with LTO. Eliminating ~125K HashMap allocations directly reduces GC work.

References

  • Sorted key cache concept from jit branch: 119b9a93
  • Inline materialization from original PR iteration
  • Self-reference detection design validated by rubber-duck critique

Result

All 55 JVM test suites pass. No regressions on any benchmark. 22% JVM improvement and 20% native improvement on realistic2.

@He-Pin He-Pin force-pushed the perf/inline-object-materializer branch 3 times, most recently from b668d90 to 6f113e7 Compare April 10, 2026 23:17
Optimize materialization of inline objects (those created from comprehensions
and literal declarations) by bypassing the generic recursive materialization
path that handles inheritance chains.

Key optimizations:
1. Inline object fast path - objects with canDirectIterate=true skip the
   generic materializeRecursiveObj code path, avoiding unnecessary hash
   lookups and visibility checks since inline objects have no super chain.

2. Sorted key cache for comprehension objects - when multiple objects are
   created from the same MemberList expression (e.g. in array comprehensions),
   the sorted field order (Array[Int] mapping sorted index to member index)
   is computed once on first object creation and cached on the MemberList
   AST node. Subsequent objects from the same expression reuse this cached
   order, eliminating repeated sort computations. For realistic2.jsonnet
   which creates ~62,500 objects from comprehensions, this eliminates 62,499
   redundant sort operations per materialization pass.

3. Pre-sorted inline materialization - materializeSortedInlineObj uses the
   cached sort order to directly iterate fields in sorted order without
   allocating intermediate arrays or performing key comparisons.

The cache is stored as a volatile field on MemberList (expression-level
cache shared across all objects from the same expression) and propagated
to each Val.Obj._sortedInlineOrder at construction time. Cache is only
used when sup==null (no super chain) since inheritance could alter the
field structure.
@He-Pin He-Pin force-pushed the perf/inline-object-materializer branch from 6f113e7 to db20ca0 Compare April 10, 2026 23:23
@He-Pin He-Pin changed the title perf: inline object materialization fast path perf: inline object materialization fast path + sorted key cache Apr 10, 2026
@He-Pin He-Pin marked this pull request as ready for review April 11, 2026 01:37
For inline-materialized objects whose fields, binds, and asserts never
reference self, super, or $, the pre-population of the field value
cache during materialization is provably unnecessary. This eliminates
~125K HashMap allocations on realistic_2.

Implementation:
- Two-mode AST scanner (hasSelfRefExpr) detects self/super/$ usage:
  - At current scope: Self, Super, $, SelectSuper, InSuper, LookupSuper
  - Inside nested objects: only $ propagates (self/super bind to inner)
- Scans field rhs, dynamic field names, method args, binds, asserts,
  and function parameter defaults
- Result cached on MemberList._noSelfRef (volatile, benign-race safe)
- Evaluator propagates flag to Val.Obj._skipFieldCache
- Materializer skips cacheFieldValue when flag is set

Regression tests:
- skip_cache_no_self_ref.jsonnet: objects without self-refs
- skip_cache_self_ref_correctness.jsonnet: objects WITH self/super/$
@He-Pin He-Pin changed the title perf: inline object materialization fast path + sorted key cache perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant