perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects by He-Pin · Pull Request #736 · databricks/sjsonnet

He-Pin · 2026-04-10T20:49:56Z

Motivation

realistic2.jsonnet creates ~62,500 objects via array comprehensions, each with ~5 fields. The generic materializeRecursiveObj code path was unnecessarily expensive for these objects because it handles inheritance chains, visibility checks, and hash-map lookups — none of which apply to inline objects that have no super chain.

Additionally, when thousands of objects are created from the same MemberList expression (array comprehension body), each object was independently computing its sorted field order during materialization.

Finally, for objects whose fields never reference self, super, or $, the field-value pre-caching during materialization is provably unnecessary — yet we were still allocating HashMaps for these objects.

Key Design Decisions

Inline object fast path — Objects with canDirectIterate=true (no super chain) use materializeSortedInlineObj / materializeInlineObj which directly iterate the inline members array, skipping hash lookups, visibility checks, and allMembers computation.
Sorted key cache — The sorted field order (Array[Int] mapping sorted index → member index) is computed once on first object creation from a MemberList and cached as a @volatile field on the AST node. All subsequent objects from the same expression reuse the cached order. For realistic2 this eliminates 62,499 redundant sort operations.
Skip field cache for no-self-ref objects — A two-mode AST scanner (hasSelfRefExpr) detects whether an object's fields, binds, and asserts reference self, super, or $. When they don't, the materializer skips cacheFieldValue calls entirely, eliminating ~125K HashMap allocations on realistic2.
- At current scope: Self, Super, $, SelectSuper, InSuper, LookupSuper → has ref
- Inside nested objects: only $ propagates (self/super bind to inner object)
- Scans field rhs, dynamic field names, method args, binds, asserts, and function parameter defaults
- Result cached on MemberList._noSelfRef (volatile, benign-race safe)
Propagation to Val.Obj — Each Val.Obj stores _sortedInlineOrder and _skipFieldCache flags set at construction time in visitMemberList.

Modification

Expr.scala: Added _cachedSortedOrder: Array[Int] and _noSelfRef: java.lang.Boolean volatile fields to ObjBody.MemberList
Val.scala: Added _sortedInlineOrder: Array[Int] and _skipFieldCache: Boolean fields to Val.Obj
Evaluator.scala: In visitMemberList, compute sorted order and no-self-ref flag on first call, cache on MemberList, assign to each Val.Obj
Materializer.scala:
- Added computeSortedInlineOrder() helper
- Added computeNoSelfRef() + hasSelfRefInMemberList() + hasSelfRefExpr() — complete two-mode scanner
- Refactored materializeSortedInlineObj to use cached order
- Guarded 4 cacheFieldValue call sites with if (!obj._skipFieldCache)

Benchmark Results

JMH Regression Suite (JVM, -f1 -wi 1 -i 1)

Benchmark	Master (ms/op)	This PR (ms/op)	Change
realistic2	63.451	49.496	-22.0% ✅
realistic1	1.969	2.032	neutral
bench.02 (foldl)	33.596	35.175	neutral
bench.03	9.628	10.466	neutral
gen_big_object	0.928	0.934	neutral
comparison	23.759	22.777	neutral
comparison2	39.282	37.927	neutral
reverse	8.480	8.749	neutral
base64DecodeBytes	7.620	7.616	neutral
All others	—	—	no regression

Scala Native (hyperfine --warmup 3 --runs 10)

Binary	realistic_2 (ms)	vs jrsonnet
Master (`147da82`)	263.7 ± 4.1	2.58x slower
This PR	210.6 ± 2.8	2.06x slower
jrsonnet (latest)	102.4 ± 3.3	1.00x

Native improvement: -20.1% (263.7ms → 210.6ms)

Scala Native — No regressions on other benchmarks

Benchmark	Master (ms)	This PR (ms)	jrsonnet (ms)
realistic_1	12.1	12.1	13.9
big_object	11.7	11.6	13.3

Analysis

The combined optimization is effective because:

Sorted key cache: Array comprehensions create thousands of objects from the same AST MemberList node. The sort order is identical for all — caching eliminates 62K redundant sort operations.
Skip field cache: When fields don't reference self/super/$, no code path during materialization calls obj.value() on the current object, making cache pre-population wasted work. The two-mode scanner correctly distinguishes self/super at current scope (which binds to the object being materialized) from self/super inside nested objects (which binds to the inner object).
Allocation reduction matters on Native: GC pressure is the main bottleneck on Scala Native with LTO. Eliminating ~125K HashMap allocations directly reduces GC work.

References

Sorted key cache concept from jit branch: 119b9a93
Inline materialization from original PR iteration
Self-reference detection design validated by rubber-duck critique

Result

All 55 JVM test suites pass. No regressions on any benchmark. 22% JVM improvement and 20% native improvement on realistic2.

Optimize materialization of inline objects (those created from comprehensions and literal declarations) by bypassing the generic recursive materialization path that handles inheritance chains. Key optimizations: 1. Inline object fast path - objects with canDirectIterate=true skip the generic materializeRecursiveObj code path, avoiding unnecessary hash lookups and visibility checks since inline objects have no super chain. 2. Sorted key cache for comprehension objects - when multiple objects are created from the same MemberList expression (e.g. in array comprehensions), the sorted field order (Array[Int] mapping sorted index to member index) is computed once on first object creation and cached on the MemberList AST node. Subsequent objects from the same expression reuse this cached order, eliminating repeated sort computations. For realistic2.jsonnet which creates ~62,500 objects from comprehensions, this eliminates 62,499 redundant sort operations per materialization pass. 3. Pre-sorted inline materialization - materializeSortedInlineObj uses the cached sort order to directly iterate fields in sorted order without allocating intermediate arrays or performing key comparisons. The cache is stored as a volatile field on MemberList (expression-level cache shared across all objects from the same expression) and propagated to each Val.Obj._sortedInlineOrder at construction time. Cache is only used when sup==null (no super chain) since inheritance could alter the field structure.

For inline-materialized objects whose fields, binds, and asserts never reference self, super, or $, the pre-population of the field value cache during materialization is provably unnecessary. This eliminates ~125K HashMap allocations on realistic_2. Implementation: - Two-mode AST scanner (hasSelfRefExpr) detects self/super/$ usage: - At current scope: Self, Super, $, SelectSuper, InSuper, LookupSuper - Inside nested objects: only $ propagates (self/super bind to inner) - Scans field rhs, dynamic field names, method args, binds, asserts, and function parameter defaults - Result cached on MemberList._noSelfRef (volatile, benign-race safe) - Evaluator propagates flag to Val.Obj._skipFieldCache - Materializer skips cacheFieldValue when flag is set Regression tests: - skip_cache_no_self_ref.jsonnet: objects without self-refs - skip_cache_self_ref_correctness.jsonnet: objects WITH self/super/$

He-Pin force-pushed the perf/inline-object-materializer branch 3 times, most recently from b668d90 to 6f113e7 Compare April 10, 2026 23:17

He-Pin force-pushed the perf/inline-object-materializer branch from 6f113e7 to db20ca0 Compare April 10, 2026 23:23

He-Pin changed the title ~~perf: inline object materialization fast path~~ perf: inline object materialization fast path + sorted key cache Apr 10, 2026

He-Pin marked this pull request as ready for review April 11, 2026 01:37

He-Pin changed the title ~~perf: inline object materialization fast path + sorted key cache~~ perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects#736

perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects#736
He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin:perf/inline-object-materializer

He-Pin commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decisions

Modification

Benchmark Results

JMH Regression Suite (JVM, -f1 -wi 1 -i 1)

Scala Native (hyperfine --warmup 3 --runs 10)

Scala Native — No regressions on other benchmarks

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 10, 2026 •

edited

Loading