perf: direct JSON renderer bypassing Visitor pattern for JVM CLI output#735
Closed
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Closed
perf: direct JSON renderer bypassing Visitor pattern for JVM CLI output#735He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Conversation
25e7066 to
e28e428
Compare
Add DirectJsonRenderer that produces JSON directly via StringBuilder,
eliminating the upickle Visitor/ObjVisitor/ArrVisitor overhead that
dominates materialization cost on the JVM (~3.3M virtual dispatch calls
on realistic2).
Key design decisions:
- JVM-only optimization via Platform.useDirectRenderer (final val).
Native's LLVM LTO already devirtualizes the Visitor pattern, making
the direct renderer counterproductive there.
- Falls back to Materializer+Renderer for:
(a) Materializable custom Val types (preserving embedding API)
(b) Subtrees beyond recursiveDepthLimit (128) to use the iterative
ArrayDeque-based materializer, preventing stack overflow on
deeply nested structures.
- Fallback Renderer overrides flushCharBuilder() with threshold 0 to
prevent silent data loss: BaseCharRenderer uses threshold 1000 at
depth >= 1, which would cause small outputs to stay in elemBuilder.
- Output is byte-identical to Renderer for all indent values.
JMH results (JVM, ms/op, lower is better):
realistic2: 62.0 → 55.9 (-9.8%)
realistic1: 2.0 → 1.8 (-6.1%)
large_string_template: 1.8 → 1.7 (-8.3%)
bench.02: 33.0 → 32.6 (-1.3%)
Hyperfine Native (realistic2 > /dev/null):
master: 251.8ms (no regression after Platform flag)
jrsonnet: 100.1ms (2.52x faster — gap unchanged on Native)
e28e428 to
f5ca848
Compare
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The upickle
Visitor/ObjVisitor/ArrVisitorpattern is the primary materialization bottleneck on JVM. Onrealistic2, the Visitor-based rendering dispatches ~3.3 million virtual method calls (per-elementvisitValue,visitKey,visitString, etc.). The JIT cannot fully devirtualize these because multipleVisitorimplementations exist in the classpath.Key Design Decisions
JVM-only optimization:
Platform.useDirectRendereris afinal val—trueon JVM,falseon Native. Native's LLVM LTO already devirtualizes the Visitor pattern at link time, making the direct renderer counterproductive there (measured: 6.8% regression → neutral after flag).Deep nesting safety: Falls back to the Materializer's hybrid recursive/iterative path (ArrayDeque-based
materializeStackless) for subtrees beyondrecursiveDepthLimit(128). This prevents stack overflow on deeply nested structures while keeping the fast path for normal depths.Materializable compatibility: Falls back to Visitor-based
Rendererfor customMaterializer.Materializablevalues, preserving the embedding API. The fallbackRendereroverridesflushCharBuilder()with threshold 0 to prevent silent data loss (BaseCharRenderer uses threshold 1000 at depth ≥ 1).Output equivalence: Produces byte-identical JSON output to
Rendererfor all indent values (minified, indent=0, indent>0). Empty containers render as{ }/[ ]matching Renderer behavior.Modification
DirectJsonRenderer.scala(~320 lines) —final classwithStringBuilder-based JSON rendering, single-pass string escaping, pre-computed indent cache,valTag-based O(1) dispatch.interpretStringify/materializeStringifyfor the direct-to-string pipeline, extractedcreateMaterializer()to share between old and new paths.renderNormalwhenPlatform.useDirectRenderer && !yamlOut && !expectString && outputFile.isEmpty.useDirectRendererflag.Benchmark Results
JMH (JVM, ms/op, lower is better)
Hyperfine (Scala Native, realistic2 > /dev/null)
Native is neutral by design (DirectRenderer disabled via
Platform.useDirectRenderer = false).Analysis
The optimization targets JVM-specific overhead:
Result
JVM materialization throughput improved by ~8-10% on output-heavy benchmarks. No regression on any benchmark. Native performance unchanged (by design).