bench(gradle): harden JMH suite against run-to-run variance#9
bench(gradle): harden JMH suite against run-to-run variance#9not-matthias wants to merge 6 commits intomainfrom
Conversation
Merging this PR will degrade performance by 14.35%
Performance Changes
Comparing Footnotes
|
Enabling forceGC calls System.gc() before each measurement iteration so a concurrent collection cannot land inside the measurement window and show up as a spurious regression. Addresses part of COD-2519 (unchanged PRs regressing due to GC noise).
c6d648e to
1d8ea65
Compare
There was a problem hiding this comment.
Pull request overview
This PR hardens the example Gradle JMH benchmark suite to reduce run-to-run variance in CI by adjusting JMH configuration, trimming parameter matrices, and removing allocation noise from timed regions.
Changes:
- Increase warmup/measurement iterations (10/10) and standardize on
@Fork(1)across benchmarks. - Trim benchmark
@Parammatrices to fewer representative values to reduce CI wall-clock and reduce noise. - Remove per-invocation allocations from hot benchmark paths (notably
SortBenchmarkand parts ofBacktrackingBenchmark) and add JMH JVM/GC stabilization knobs in Gradle.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/example-gradle/build.gradle.kts | Enable forceGC and add -Xbatch to reduce GC/JIT jitter during measurement. |
| examples/example-gradle/src/jmh/java/com/thealgorithms/sorts/SortBenchmark.java | Reduce param sizes, drop redundant sort variants, and replace per-invocation Arrays.copyOf with System.arraycopy into a reusable buffer. |
| examples/example-gradle/src/jmh/java/bench/SleepBenchmark.java | Increase warmup/measurement iterations for stability. |
| examples/example-gradle/src/jmh/java/bench/RleBenchmark.java | Increase warmup/measurement iterations and reduce input-size params. |
| examples/example-gradle/src/jmh/java/bench/RegexBenchmark.java | Increase warmup/measurement iterations and reduce backtracking param matrix. |
| examples/example-gradle/src/jmh/java/bench/FibBenchmark.java | Increase warmup/measurement iterations. |
| examples/example-gradle/src/jmh/java/bench/DynamicProgrammingBenchmark.java | Increase warmup/measurement iterations and trim multiple DP benchmark params. |
| examples/example-gradle/src/jmh/java/bench/BitManipulationBenchmark.java | Increase warmup/measurement iterations and trim bit-value params. |
| examples/example-gradle/src/jmh/java/bench/BacktrackingBenchmark.java | Increase warmup/measurement iterations, trim params, and hoist allocations into @Setup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
-Xbatch serializes JIT work onto the benchmark thread. Fixed heap size, pre-touched pages, serial GC, and disabled adaptive sizing eliminate heap resizes, page faults, and concurrent GC jitter during measurement.
The DP, bit-manipulation, RLE and regex benchmarks each enumerated 2-5 input sizes purely to show scaling. For CodSpeed regression detection we only need one point on the curve per benchmark method — additional values multiply CI wall-clock without adding signal. Part of COD-2519.
…tion Two related changes: - Keep only 4 representative sort algorithms (quickSort, mergeSort, timSort, dualPivotQuickSort). The five dropped variants (heap, insertion, selection, shell, introspective) sort the same Integer[] input through similar code paths and don't exercise distinct regressions. Also narrows the param matrix to a single size (10000). - Replace copyData()'s Arrays.copyOf — which allocated a fresh Integer[] on every invocation — with a System.arraycopy into a pre-allocated working buffer. For small sizes the allocation and its GC pressure dominated the sort work itself and was the primary source of the timSort[100] flakiness called out in COD-2519.
…setup Reduce each Param to 2 representative values (was 3-5). The smallest values in the original sets were well below the JMH harness noise floor. Also pre-allocate the Integer[] / ArrayList inputs in Setup(Trial) for generateCombinations, permutations, and generateSubsequences. Previously each Benchmark method allocated these structures inline on every invocation, creating GC pressure that showed up as run-to-run variance. Part of COD-2519.
All 8 benchmark classes previously ran at Warmup(1), Measurement(3), Fork(1) — too little of everything to produce stable JMH numbers. In particular Fork(1) is the core of COD-2519: a single JVM launch per benchmark can't separate real regressions from JIT or ASLR luck-of-the-draw. Settle on Warmup(10), Measurement(10), Fork(1) for a ~14 min total CI budget across the ~40 combos. Generous warmup reaches C2 steady state; generous measurement iterations give JMH tight confidence intervals on the per-fork score. Fork(1) leaves cross-JVM variance sampling to the 6-distribution CI matrix, which already runs each benchmark in 6 independent JVMs.
af4a59a to
dacf718
Compare
GuillaumeLagrange
left a comment
There was a problem hiding this comment.
OLGTM but I have two remarks
- Do we have stats of the variance before/after changes in order to make sure we're not throwing things around randomly and hope it improves stuff?
- The warmup and iterations required for our benchmarks to be less noisy should be somewhere in the docs as guidelines, because people ARE going to report unreliable results, as they should, if we encountered them ourselves
forceGC = truein the Gradlejmhblock —System.gc()betweeniterations so a GC pause can't land mid-measurement.
-XbatchJVM arg — synchronous JIT compilation so the C2 backgroundthread can't steal cycles during measurement.
single representative value each.
keep one size, replace
Arrays.copyOf(freshInteger[]per invocation)with
System.arraycopyinto a pre-allocated working buffer. This is thedirect fix for the
timSort[100]flakiness called out in the ticket.values and move
Integer[]/ArrayListallocations out of the timedregion via Setup.
Generous warmup reaches JIT steady state; generous measurement iterations
give tight per-fork confidence intervals. Fork(1) leaves cross-JVM variance
sampling to the 6-distribution CI matrix.