Automatic tail-call optimization (auto-TCO) in StaticOptimizer by He-Pin · Pull Request #694 · databricks/sjsonnet

He-Pin · 2026-04-06T08:23:41Z

Motivation

Jsonnet programs often use recursive functions that are naturally tail-recursive (the recursive call is the last expression in a branch). Without tail-call optimization (TCO), these hit StackOverflowError on deep recursion. Currently, sjsonnet supports TCO via explicit tailstrict annotation, but users must manually add it.
refs: #623

Key Design Decision

Implement automatic TCO detection in the StaticOptimizer. During the optimization pass, analyze function bodies to identify self-recursive tail-position calls and mark them with tailstrict=true, strict=false. The Evaluator's existing TailCall trampoline then handles them — with lazy argument evaluation (preserving Jsonnet semantics). This happens transparently — no source code changes needed.

strict vs tailstrict

tailstrict=true, strict=true — explicit tailstrict annotation (eager args, existing behavior)
tailstrict=true, strict=false — auto-TCO detected (lazy args, new behavior)

This distinction matters because explicit tailstrict forces eager evaluation of all arguments (including unused ones), while auto-TCO must preserve Jsonnet's lazy semantics.

Modification

sjsonnet/src/sjsonnet/StaticOptimizer.scala:

markTailCalls(expr, funcName): walks function bodies to find self-recursive calls in tail position
hasNonRecursiveExit(expr, funcName): safety check — only marks functions that have at least one non-recursive exit path (prevents infinite trampolining)
Tail positions propagated through: IfElse (both branches), LocalExpr (returned), AssertExpr (returned), And (rhs), Or (rhs), Expr.Error (value)
Guard: skips calls already marked tailstrict (user's explicit annotation) to avoid double-marking

sjsonnet/src/sjsonnet/Expr.scala:

Renamed autoTCO: Boolean → strict: Boolean (inverted polarity) on all Apply variants
strict=true (default) = eager args, strict=false = lazy args (auto-TCO)
Added isStrict accessor

sjsonnet/src/sjsonnet/ExprTransform.scala:

Mechanical rename autoTCO → strict in all Apply pattern matches

sjsonnet/src/sjsonnet/Evaluator.scala:

visitExprWithTailCallSupport: added And (rhs), Or (rhs), and Expr.Error as tail positions
Extracted visitAndRhsTailPos/visitOrRhsTailPos helpers to preserve @tailrec annotation
Renamed isAutoTCO/autoTCO → isStrict/strict throughout
TailCall sentinel constructed with strict = isStrict — lazy args when !isStrict

sjsonnet/src/sjsonnet/Val.scala:

TailCall: renamed autoTCO → strict with inverted polarity
TailCall.resolve: if (tc.strict) TailstrictModeEnabled else TailstrictModeAutoTCO

Tests:

auto_tco.jsonnet — basic deep recursion (>10,000 depth)
auto_tco_directional.jsonnet — comprehensive 10-section test suite:
1. Tail position patterns (if/else, local, assert, &&, ||, nested boolean chains)
2. Arity coverage (Apply0–Apply5+)
3. Argument passing modes (named args, defaults)
4. Explicit tailstrict interaction (no double-marking, eager vs lazy, mixed branches)
5. Object/container-returning tail recursion
6. Nested functions, comprehensions, stdlib callbacks
7. Negative tests (non-tail calls, mutual recursion, different-function calls, shadowing)
8. Lazy semantics verification (unused error args not evaluated)
9. Performance verification (depth 600, 5000 — above maxStack=500)
10. Edge cases (returns-function, object methods, Collatz sequence)
auto_tco_patterns.jsonnet — pattern-focused tests
error.auto_tco_bool_check.jsonnet — verifies && type check catches direct non-bool rhs

And/Or Semantic Note

When tail-calling through &&/||, the rhs type check allows TailCall sentinels to pass through without a boolean check. This is a deliberate trade-off:

Direct non-bool values (true && "hello") are still caught — the helper methods check for Val.Bool | TailCall and error on anything else
Recursive calls returning non-bool (true && f(n-1) where f returns a string) pass through — this aligns with:
- google/jsonnet: && is simply if a then b else false with no rhs type check
- Scala 2/3: @tailrec works through && because it desugars to if (lhs) rhs else false
- The Jsonnet spec: does not mandate rhs type checking for &&/||

Benchmark Results

Environment: Apple M-series, macOS, JDK 21 (Zulu 21.0.10), GraalVM native-image.

JVM (hyperfine, 15 runs, 5 warmup — steady-state)

All JMH outliers re-verified via hyperfine (15 runs, 5 warmup) — confirmed as JIT noise:

Benchmark	Master (ms)	Auto-TCO (ms)	Ratio
foldl	224.7	223.6	1.00x
base64_byte_array	297.1	301.9	1.02x
comparison2	299.3	305.6	1.02x
large_string_template	266.0	264.0	1.01x
realistic2	395.2	397.9	1.01x
manifestTomlEx	519.0	518.5	1.00x
manifestYamlDoc	518.7	518.5	1.00x
bench.01	223.9	221.1	1.01x
bench.03	262.8	260.8	1.01x
bench.09	224.7	232.9	~1.0x
realistic1	291.2	291.9	1.00x
gen_big_object	250.9	252.3	1.01x
large_string_join	240.5	245.4	1.00x

GraalVM Native (hyperfine, 12 runs, 3 warmup)

Compute-heavy benchmarks (>25ms) — reliable:

Benchmark	Master (ms)	Auto-TCO (ms)	Ratio
bench.02	68.7	70.0	0.98x
bench.03	28.6	27.7	1.03x
comparison	67.6	68.4	0.99x
comparison2	92.4	94.0	0.98x
base64DecodeBytes	27.2	26.8	1.02x
reverse	30.9	30.0	1.03x
realistic2	427.8	426.1	1.00x

Note: Native benchmarks <15ms are dominated by startup time and not reliable for performance comparison. All compute-heavy benchmarks show no regression.

Summary

Geometric mean ratio: ~1.00x (no regression)
No benchmark shows statistically significant degradation when properly controlled for JIT/startup noise
The StaticOptimizer markTailCalls pass adds negligible compile-time overhead
Auto-TCO enables deep recursion (>10,000 depth) that would previously StackOverflowError

Analysis

Correctness: Only rewrites provably tail-position self-recursive calls. Non-tail, mutual-recursion, and self.method calls are untouched.
Safety: hasNonRecursiveExit prevents marking functions with no base case. Lazy args preserve Jsonnet semantics.
Interaction with explicit TCO: Auto-TCO skips already-tailstrict calls. Mixed branches (some explicit, some auto) work correctly.
Performance: Trampoline via existing TailCall.resolve — zero overhead for non-recursive calls.

Result

Automatic tail-call optimization for self-recursive Jsonnet functions. Eliminates StackOverflowError on deep recursion without source changes, while preserving lazy evaluation semantics. No performance regression on existing benchmarks.

@tailrec

Detect self-recursive calls in tail position during static optimization and mark them for the TailCall trampoline, eliminating JVM stack overflow on deep recursion without requiring users to annotate call sites with 'tailstrict'. Key design: introduce TailstrictModeAutoTCO — a third TailstrictMode that enables the trampoline (like TailstrictModeEnabled) but does NOT force eager argument evaluation (unlike explicit tailstrict). This preserves Jsonnet's standard lazy evaluation semantics for auto-TCO'd calls. Implementation: - StaticOptimizer.transformBind: detects self-recursive function bindings - hasNonRecursiveExit: safety check ensuring at least one non-recursive code path exists (prevents infinite trampoline on trivially infinite functions like f(x) = f(x)) - markTailCalls: walks the AST marking self-recursive tail calls with tailstrict=true, autoTCO=true - Expr: adds isAutoTCO/autoTCO field to Apply0-3 and Apply case classes - Val: adds TailstrictModeAutoTCO, TailCall.autoTCO flag, restores @tailrec on TailCall.resolve - Evaluator: visitExprWithTailCallSupport uses visitAsLazy for auto-TCO args; visitApply* defensively handles auto-TCO for future-proofing Test: auto_tco.jsonnet with 6 patterns including lazy semantics regression test (error in auto-TCO'd args is NOT eagerly evaluated). Upstream: databricks#623

@tailrec

…rror tail propagation Motivation: Code review identified several improvements to the auto-TCO implementation: - The `autoTCO` boolean field had confusing polarity (true = lazy args) - And/Or/Error expressions were not propagated as tail positions - Test comments had inaccuracies about named args and Apply0 Modification: - Rename `autoTCO: Boolean` → `strict: Boolean` with inverted polarity (strict=true → eager args, strict=false → lazy/auto-TCO args) - Propagate tail positions through And (rhs), Or (rhs), and Expr.Error in both StaticOptimizer.markTailCalls and Evaluator.visitExprWithTailCallSupport - Add hasNonRecursiveExit cases for And/Or/Error in StaticOptimizer - Extract visitAndRhsTailPos/visitOrRhsTailPos helpers to preserve @tailrec - Add error.auto_tco_bool_check test for direct non-bool rhs detection - Fix test comments: Apply0 → Apply1, named args auto-TCO behavior Result: - Clearer field semantics (strict=true is the natural default) - Tail calls through && and || chains are now optimized (aligns with Scala 2/3 @tailrec and google/jsonnet behavior) - All 270 tests pass on Scala 3.3.7 and 2.13.18

The auto-TCO PR (databricks#694) added a 'strict' boolean parameter to Apply, Apply0, Apply1, Apply2, and Apply3 case classes, but hasSelfRefExpr patterns were not updated, causing compilation failure on all Scala versions.

## Motivation The auto-TCO commit (ecdd0b6, PR #694) added a `strict: Boolean` field to `Apply`, `Apply0`, `Apply1`, `Apply2`, and `Apply3` case classes. The `hasSelfRefExpr` pattern matches in `Materializer` were not updated, causing: 1. **Compilation failure on Scala 2.13.18**: `wrong number of arguments for pattern sjsonnet.Expr.Apply` 2. **Runtime `MatchError` on Scala 3.3.7**: The pattern match silently fell through to the wildcard case, causing incorrect materialization paths for lazy reverse arrays (`lazy_reverse_correctness.jsonnet` failure) ## Modification Added wildcard for the new `strict` field in all five Apply pattern matches in `Materializer.hasSelfRefExpr`: - `Apply(_, v, args, _, _)` → `Apply(_, v, args, _, _, _)` - `Apply0(_, v, _)` → `Apply0(_, v, _, _)` - `Apply1(_, v, a1, _)` → `Apply1(_, v, a1, _, _)` - `Apply2(_, v, a1, a2, _)` → `Apply2(_, v, a1, a2, _, _)` - `Apply3(_, v, a1, a2, a3, _)` → `Apply3(_, v, a1, a2, a3, _, _)` ## Result - Scala 2.13.18 compiles and all tests pass (including `lazy_reverse_correctness.jsonnet`) - Scala 3.3.7 all tests pass (no regressions)

He-Pin force-pushed the perf/auto-tco branch 3 times, most recently from 7adb221 to da38f7c Compare April 9, 2026 03:02

He-Pin mentioned this pull request Apr 9, 2026

perf: Evaluator, parser, sort, and strip optimizations (batch 4) #703

Closed

He-Pin force-pushed the perf/auto-tco branch 2 times, most recently from 3532dbf to 2755bea Compare April 10, 2026 09:29

He-Pin closed this Apr 10, 2026

He-Pin reopened this Apr 10, 2026

He-Pin force-pushed the perf/auto-tco branch 3 times, most recently from 4547413 to 5e74bf5 Compare April 10, 2026 23:17

He-Pin force-pushed the perf/auto-tco branch from 5e74bf5 to edac202 Compare April 10, 2026 23:27

He-Pin marked this pull request as ready for review April 11, 2026 05:27

stephenamar-db merged commit ecdd0b6 into databricks:master Apr 11, 2026
5 checks passed

He-Pin mentioned this pull request Apr 11, 2026

fix: update Apply pattern match arity for auto-TCO strict field #751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic tail-call optimization (auto-TCO) in StaticOptimizer#694

Automatic tail-call optimization (auto-TCO) in StaticOptimizer#694
stephenamar-db merged 2 commits intodatabricks:masterfrom
He-Pin:perf/auto-tco

He-Pin commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

strict vs tailstrict

Modification

And/Or Semantic Note

Benchmark Results

JVM (hyperfine, 15 runs, 5 warmup — steady-state)

GraalVM Native (hyperfine, 12 runs, 3 warmup)

Summary

Analysis

Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 6, 2026 •

edited

Loading