bench(gradle): harden JMH suite against run-to-run variance by not-matthias · Pull Request #9 · CodSpeedHQ/codspeed-jvm

not-matthias · 2026-04-17T16:49:12Z

forceGC = true in the Gradle jmh block — System.gc() between
iterations so a GC pause can't land mid-measurement.
-Xbatch JVM arg — synchronous JIT compilation so the C2 background
thread can't steal cycles during measurement.
Trim Param matrices across DP / Bit / Rle / Regex benchmarks to a
single representative value each.
SortBenchmark trim + allocation fix — drop 5 redundant sort variants,
keep one size, replace Arrays.copyOf (fresh Integer[] per invocation)
with System.arraycopy into a pre-allocated working buffer. This is the
direct fix for the timSort[100] flakiness called out in the ticket.
BacktrackingBenchmark trim + allocation hoist — reduce each Param to 2
values and move Integer[] / ArrayList allocations out of the timed
region via Setup.
Tune Warmup(10) / Measurement(10) / Fork(1) on all 8 benchmark classes.
Generous warmup reaches JIT steady state; generous measurement iterations
give tight per-fork confidence intervals. Fork(1) leaves cross-JVM variance
sampling to the 6-distribution CI matrix.

codspeed-hq · 2026-04-17T17:23:03Z

Merging this PR will degrade performance by 14.35%

⚡ 12 improved benchmarks
❌ 2 regressed benchmarks
✅ 28 untouched benchmarks
⏩ 92 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`generateCombinations[9]`	28.6 µs	24 µs	+19.3%
❌	`fib[30]`	8.4 ms	9.4 ms	-11.22%
⚡	`permutations[7]`	554.3 µs	451.8 µs	+22.7%
⚡	`mergeSort[10000]`	2.9 ms	2.2 ms	+35.3%
❌	`levenshteinDistance[saturday sunday]`	400 ns	467 ns	-14.35%
⚡	`dualPivotQuickSort[10000]`	4.9 ms	3.3 ms	+48.21%
⚡	`quickSort[10000]`	2.5 ms	2.2 ms	+13.72%
⚡	`generateCombinations[7]`	12.6 µs	11 µs	+14.35%
⚡	`fibonacciOptimized[30]`	31 ns	27 ns	+14.81%
⚡	`fibonacciBottomUp[30]`	2.7 µs	2.2 µs	+23.65%
⚡	`permutations[5]`	12.8 µs	11 µs	+16.9%
⚡	`multiPatternScan[24]`	6.4 ms	5.3 ms	+19.29%
⚡	`compileAndMatch[24]`	4 ms	3.6 ms	+11.27%
⚡	`timSort[10000]`	2.4 ms	1.6 ms	+48.37%

_{Comparing cod-2519-harden-codspeed-jvm-against-regressions (dacf718) with main (c7c19aa)}

92 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Enabling forceGC calls System.gc() before each measurement iteration so a concurrent collection cannot land inside the measurement window and show up as a spurious regression. Addresses part of COD-2519 (unchanged PRs regressing due to GC noise).

Copilot

Pull request overview

This PR hardens the example Gradle JMH benchmark suite to reduce run-to-run variance in CI by adjusting JMH configuration, trimming parameter matrices, and removing allocation noise from timed regions.

Changes:

Increase warmup/measurement iterations (10/10) and standardize on @Fork(1) across benchmarks.
Trim benchmark @Param matrices to fewer representative values to reduce CI wall-clock and reduce noise.
Remove per-invocation allocations from hot benchmark paths (notably SortBenchmark and parts of BacktrackingBenchmark) and add JMH JVM/GC stabilization knobs in Gradle.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
examples/example-gradle/build.gradle.kts	Enable `forceGC` and add `-Xbatch` to reduce GC/JIT jitter during measurement.
examples/example-gradle/src/jmh/java/com/thealgorithms/sorts/SortBenchmark.java	Reduce param sizes, drop redundant sort variants, and replace per-invocation `Arrays.copyOf` with `System.arraycopy` into a reusable buffer.
examples/example-gradle/src/jmh/java/bench/SleepBenchmark.java	Increase warmup/measurement iterations for stability.
examples/example-gradle/src/jmh/java/bench/RleBenchmark.java	Increase warmup/measurement iterations and reduce input-size params.
examples/example-gradle/src/jmh/java/bench/RegexBenchmark.java	Increase warmup/measurement iterations and reduce backtracking param matrix.
examples/example-gradle/src/jmh/java/bench/FibBenchmark.java	Increase warmup/measurement iterations.
examples/example-gradle/src/jmh/java/bench/DynamicProgrammingBenchmark.java	Increase warmup/measurement iterations and trim multiple DP benchmark params.
examples/example-gradle/src/jmh/java/bench/BitManipulationBenchmark.java	Increase warmup/measurement iterations and trim bit-value params.
examples/example-gradle/src/jmh/java/bench/BacktrackingBenchmark.java	Increase warmup/measurement iterations, trim params, and hoist allocations into `@Setup`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

-Xbatch serializes JIT work onto the benchmark thread. Fixed heap size, pre-touched pages, serial GC, and disabled adaptive sizing eliminate heap resizes, page faults, and concurrent GC jitter during measurement.

The DP, bit-manipulation, RLE and regex benchmarks each enumerated 2-5 input sizes purely to show scaling. For CodSpeed regression detection we only need one point on the curve per benchmark method — additional values multiply CI wall-clock without adding signal. Part of COD-2519.

…tion Two related changes: - Keep only 4 representative sort algorithms (quickSort, mergeSort, timSort, dualPivotQuickSort). The five dropped variants (heap, insertion, selection, shell, introspective) sort the same Integer[] input through similar code paths and don't exercise distinct regressions. Also narrows the param matrix to a single size (10000). - Replace copyData()'s Arrays.copyOf — which allocated a fresh Integer[] on every invocation — with a System.arraycopy into a pre-allocated working buffer. For small sizes the allocation and its GC pressure dominated the sort work itself and was the primary source of the timSort[100] flakiness called out in COD-2519.

…setup Reduce each Param to 2 representative values (was 3-5). The smallest values in the original sets were well below the JMH harness noise floor. Also pre-allocate the Integer[] / ArrayList inputs in Setup(Trial) for generateCombinations, permutations, and generateSubsequences. Previously each Benchmark method allocated these structures inline on every invocation, creating GC pressure that showed up as run-to-run variance. Part of COD-2519.

All 8 benchmark classes previously ran at Warmup(1), Measurement(3), Fork(1) — too little of everything to produce stable JMH numbers. In particular Fork(1) is the core of COD-2519: a single JVM launch per benchmark can't separate real regressions from JIT or ASLR luck-of-the-draw. Settle on Warmup(10), Measurement(10), Fork(1) for a ~14 min total CI budget across the ~40 combos. Generous warmup reaches C2 steady state; generous measurement iterations give JMH tight confidence intervals on the per-fork score. Fork(1) leaves cross-JVM variance sampling to the 6-distribution CI matrix, which already runs each benchmark in 6 independent JVMs.

GuillaumeLagrange

OLGTM but I have two remarks

Do we have stats of the variance before/after changes in order to make sure we're not throwing things around randomly and hope it improves stuff?
The warmup and iterations required for our benchmarks to be less noisy should be somewhere in the docs as guidelines, because people ARE going to report unreliable results, as they should, if we encountered them ourselves

not-matthias force-pushed the cod-2519-harden-codspeed-jvm-against-regressions branch from c6d648e to 1d8ea65 Compare April 17, 2026 18:01

not-matthias requested a review from Copilot April 17, 2026 18:04

Copilot started reviewing on behalf of not-matthias April 17, 2026 18:05 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

not-matthias added 5 commits April 20, 2026 18:21

bench(gradle): add JVM flags for JIT and GC stability

e6a5677

-Xbatch serializes JIT work onto the benchmark thread. Fixed heap size, pre-touched pages, serial GC, and disabled adaptive sizing eliminate heap resizes, page faults, and concurrent GC jitter during measurement.

not-matthias force-pushed the cod-2519-harden-codspeed-jvm-against-regressions branch from af4a59a to dacf718 Compare April 20, 2026 16:24

not-matthias requested a review from GuillaumeLagrange April 20, 2026 16:25

GuillaumeLagrange approved these changes Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(gradle): harden JMH suite against run-to-run variance#9

bench(gradle): harden JMH suite against run-to-run variance#9
not-matthias wants to merge 6 commits intomainfrom
cod-2519-harden-codspeed-jvm-against-regressions

not-matthias commented Apr 17, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GuillaumeLagrange left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

not-matthias commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 14.35%

Performance Changes

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GuillaumeLagrange left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

not-matthias commented Apr 17, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 17, 2026 •

edited

Loading