perf: inline object materialization fast path + sorted key cache + skip-cache for no-self-ref objects#736
Open
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Open
Conversation
b668d90 to
6f113e7
Compare
Optimize materialization of inline objects (those created from comprehensions and literal declarations) by bypassing the generic recursive materialization path that handles inheritance chains. Key optimizations: 1. Inline object fast path - objects with canDirectIterate=true skip the generic materializeRecursiveObj code path, avoiding unnecessary hash lookups and visibility checks since inline objects have no super chain. 2. Sorted key cache for comprehension objects - when multiple objects are created from the same MemberList expression (e.g. in array comprehensions), the sorted field order (Array[Int] mapping sorted index to member index) is computed once on first object creation and cached on the MemberList AST node. Subsequent objects from the same expression reuse this cached order, eliminating repeated sort computations. For realistic2.jsonnet which creates ~62,500 objects from comprehensions, this eliminates 62,499 redundant sort operations per materialization pass. 3. Pre-sorted inline materialization - materializeSortedInlineObj uses the cached sort order to directly iterate fields in sorted order without allocating intermediate arrays or performing key comparisons. The cache is stored as a volatile field on MemberList (expression-level cache shared across all objects from the same expression) and propagated to each Val.Obj._sortedInlineOrder at construction time. Cache is only used when sup==null (no super chain) since inheritance could alter the field structure.
6f113e7 to
db20ca0
Compare
For inline-materialized objects whose fields, binds, and asserts never reference self, super, or $, the pre-population of the field value cache during materialization is provably unnecessary. This eliminates ~125K HashMap allocations on realistic_2. Implementation: - Two-mode AST scanner (hasSelfRefExpr) detects self/super/$ usage: - At current scope: Self, Super, $, SelectSuper, InSuper, LookupSuper - Inside nested objects: only $ propagates (self/super bind to inner) - Scans field rhs, dynamic field names, method args, binds, asserts, and function parameter defaults - Result cached on MemberList._noSelfRef (volatile, benign-race safe) - Evaluator propagates flag to Val.Obj._skipFieldCache - Materializer skips cacheFieldValue when flag is set Regression tests: - skip_cache_no_self_ref.jsonnet: objects without self-refs - skip_cache_self_ref_correctness.jsonnet: objects WITH self/super/$
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
realistic2.jsonnetcreates ~62,500 objects via array comprehensions, each with ~5 fields. The genericmaterializeRecursiveObjcode path was unnecessarily expensive for these objects because it handles inheritance chains, visibility checks, and hash-map lookups — none of which apply to inline objects that have no super chain.Additionally, when thousands of objects are created from the same
MemberListexpression (array comprehension body), each object was independently computing its sorted field order during materialization.Finally, for objects whose fields never reference
self,super, or$, the field-value pre-caching during materialization is provably unnecessary — yet we were still allocating HashMaps for these objects.Key Design Decisions
Inline object fast path — Objects with
canDirectIterate=true(no super chain) usematerializeSortedInlineObj/materializeInlineObjwhich directly iterate the inline members array, skipping hash lookups, visibility checks, and allMembers computation.Sorted key cache — The sorted field order (
Array[Int]mapping sorted index → member index) is computed once on first object creation from aMemberListand cached as a@volatilefield on the AST node. All subsequent objects from the same expression reuse the cached order. For realistic2 this eliminates 62,499 redundant sort operations.Skip field cache for no-self-ref objects — A two-mode AST scanner (
hasSelfRefExpr) detects whether an object's fields, binds, and asserts referenceself,super, or$. When they don't, the materializer skipscacheFieldValuecalls entirely, eliminating ~125K HashMap allocations on realistic2.Self,Super,$,SelectSuper,InSuper,LookupSuper→ has ref$propagates (self/super bind to inner object)MemberList._noSelfRef(volatile, benign-race safe)Propagation to Val.Obj — Each
Val.Objstores_sortedInlineOrderand_skipFieldCacheflags set at construction time invisitMemberList.Modification
Expr.scala: Added_cachedSortedOrder: Array[Int]and_noSelfRef: java.lang.Booleanvolatile fields toObjBody.MemberListVal.scala: Added_sortedInlineOrder: Array[Int]and_skipFieldCache: Booleanfields toVal.ObjEvaluator.scala: InvisitMemberList, compute sorted order and no-self-ref flag on first call, cache onMemberList, assign to eachVal.ObjMaterializer.scala:computeSortedInlineOrder()helpercomputeNoSelfRef()+hasSelfRefInMemberList()+hasSelfRefExpr()— complete two-mode scannermaterializeSortedInlineObjto use cached ordercacheFieldValuecall sites withif (!obj._skipFieldCache)Benchmark Results
JMH Regression Suite (JVM, -f1 -wi 1 -i 1)
Scala Native (hyperfine --warmup 3 --runs 10)
Native improvement: -20.1% (263.7ms → 210.6ms)
Scala Native — No regressions on other benchmarks
Analysis
The combined optimization is effective because:
Sorted key cache: Array comprehensions create thousands of objects from the same AST
MemberListnode. The sort order is identical for all — caching eliminates 62K redundant sort operations.Skip field cache: When fields don't reference
self/super/$, no code path during materialization callsobj.value()on the current object, making cache pre-population wasted work. The two-mode scanner correctly distinguishesself/superat current scope (which binds to the object being materialized) fromself/superinside nested objects (which binds to the inner object).Allocation reduction matters on Native: GC pressure is the main bottleneck on Scala Native with LTO. Eliminating ~125K HashMap allocations directly reduces GC work.
References
119b9a93Result
All 55 JVM test suites pass. No regressions on any benchmark. 22% JVM improvement and 20% native improvement on realistic2.