perf: direct-write stdout with unsynchronized CompactByteArrayOutputStream#680
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: direct-write stdout with unsynchronized CompactByteArrayOutputStream#680He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
b2862ad to
e48a9f2
Compare
Contributor
Author
|
Will ByteArrayOutputStream too, there is an optimization in pekko, and can be used here too. |
5297f3e to
1f9acf0
Compare
Collaborator
|
workflows are working again. |
496819d to
caa8239
Compare
…tream Bypass StringWriter → toString → println overhead when writing to stdout by buffering rendered output in CompactByteArrayOutputStream and writing it directly via writeTo(stdout). CompactByteArrayOutputStream is inspired by Apache Pekko's unsynchronized buffer approach: - No synchronization on write ops (avoids pthread mutex on Scala Native) - 1.5x growth factor (vs 2x in BAOS) reduces memory waste - writeTo() provides zero-copy transfer to stdout - On error, buffer is simply discarded (atomicity) When outputting to a file (-o), the original StringWriter path is used. Benchmark (Scala Native, hyperfine --warmup 3 --runs 10): realistic2: 258.7ms → 226.8ms (-12.3%) large_string_template: 23.6ms → 15.1ms (-36.0%)
caa8239 to
67c1157
Compare
Contributor
Author
|
Done! Updated to use a Pekko-inspired
Results on Scala Native (hyperfine):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When outputting to stdout (no
--output-file), sjsonnet renders JSON into aStringWriterbacked byStringBuffer, callstoString()to create the full string, thenprintln()to encode and write it. For realistic2 (28.6 MB JSON output), this creates three sources of overhead:write()call onStringBufferissynchronized. On Scala Native, eachsynchronizedblock maps to a pthread mutex lock/unlock pair — on a 28MB output with ~thousands of write calls, this is significant.StringWriter.toString()creates a full copy of the 28MB char array into a String. Combined with StringBuffer's 2x growth factor, peak memory reaches ~3x the output size.PrintStream.println(String)must encode the entire 28MB String from UTF-16 chars to UTF-8 bytes for stdout.Key Design Decision
Inspired by Apache Pekko's unsynchronized buffer approach, we render directly to bytes and write them to stdout in a single bulk operation:
writeTo()for zero-copy transferstdoutis null (library/programmatic use) or--output-fileis specified, the original StringWriter path is usedModification
sjsonnet/src-jvm-native/sjsonnet/SjsonnetMainBase.scala:CompactByteArrayOutputStreamprivate inner class (unsynchronized, 1.5x growth,writeTo)stdout: PrintStreamparameter towriteToFile,renderNormal, andmainConfiguredstdout != nulland no output file: renders throughOutputStreamWriter→CompactByteArrayOutputStream→writeTo(stdout)→flush()Benchmark Results
Environment: Apple M3 Max, macOS 15.4, Scala Native 0.5.8
Scala Native — hyperfine (warmup 3, runs 10)
JMH (JVM steady-state — I/O not measured, no change expected)
Analysis
interpret()which returns a String, not the CLI output pathwriteTo(stdout)transfer the bytesReferences
ByteArrayOutputStream: https://github.com/apache/pekko/blob/main/actor/src/main/scala/org/apache/pekko/util/ByteArrayOutputStream.scalaResult
Eliminates StringWriter/StringBuffer synchronization overhead and intermediate String copy for stdout output. Improves realistic2 by 12.3% and large_string_template by 36% on Scala Native. No functional change — only affects the CLI I/O path.