perf: fold LSB-test `i32.and X 1` into `i32.ctz` in boolean contexts by ggreif · Pull Request #8562 · WebAssembly/binaryen

ggreif · 2026-04-01T09:39:03Z

Summary

An if-else conditioned on (i32.and X (i32.const 1)) tests the least significant bit of X. Since i32.ctz X == 0 iff the LSB of X is set, we can replace the condition with i32.ctz X and swap the branches — saving one instruction.

The second commit extends this to the primary pattern from the issue — eqz(and X 1) as a boolean condition (used in br_if, if, select) — handled in optimizeBoolean so all three sites benefit from one insertion.

Handles the constant on either side (left or right of and)
visitIf: (and X 1); if T E → (ctz X); if E T
optimizeBoolean: eqz(and X 1) → ctz X — covers the typical br_if (eqz (and X 1)) pattern

Motivation

Filed in #5752. The Motoko compiler already implements this in its own peephole optimizer (instrList.ml); the goal is to bring it to wasm-opt so that hand-written Wasm (e.g. the Motoko RTS, written in Rust) benefits too.

The optimizeBoolean rule alone fires 26–105 times across the three Motoko RTS variants (mo-rts-eop, mo-rts-incremental, mo-rts-non-incremental), targeting the is_skewed/is_scalar pointer-tagging checks in the GC hot path.

Applying wasm-opt --optimize-instructions to the Motoko RTS and running the benchmark suite shows the following gross effects (the submitted optimisation is a contributing factor alongside other rules triggered in the same pass):

Benchmark	Before	After	Δ
`heap-32` (GC-heavy, run 1)	1,153,792,735 instr	1,151,398,207 instr	−2,394,528 (−0.21%)
`heap-32` (run 2)	1,256,407,315 instr	1,253,408,059 instr	−2,999,256 (−0.24%)
`heap-64` (run 1)	1,324,057,357 instr	1,321,855,449 instr	−2,201,908 (−0.17%)
`heap-64` (run 2)	1,295,845,087 instr	1,293,744,743 instr	−2,100,344 (−0.16%)
`bignum`	2,504,499 cycles	2,504,383 cycles	−116
`candid-subtype-cost`	1,115,011 cycles	1,114,823 cycles	−188

The GC-heavy heap benchmarks benefit most, consistent with the is_skewed check firing frequently during pointer traversal.

Test plan

New lit test test/lit/passes/optimize-instructions-lsb-if.wast covers if (const left and right) and br_if (eqz (and X 1))
All three test cases produce i32.ctz in the output

🤖 Generated with Claude Code

…X; if E T` An if-else conditioned on `(i32.and X (i32.const 1))` tests the LSB of X. Since `i32.ctz X == 0` iff the LSB of X is set, we can replace the condition with `i32.ctz X` and swap the branches — saving one instruction. Handles the constant on either side (left or right of `and`). Relates to: WebAssembly#5752 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…an context In boolean contexts (if, br_if, select), `eqz(and X 1)` and `ctz X` have the same truthiness: both are truthy iff LSB(X) == 0. Replacing eqz+and with ctz saves one instruction and covers the primary pattern from WebAssembly#5752: i32.const 1; i32.and; i32.eqz; br_if N ==> i32.ctz; br_if N This fires via `optimizeBoolean`, so it covers `if`, `br_if`, and `select` conditions in one place. Observed ~26–105 hits across Motoko RTS variants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add ggreif/binaryen (branch gabor/lsb-if-ctz-flake) as a flake input, exposing a patched wasm-opt that folds LSB-test `i32.and X 1` patterns into `i32.ctz` (WebAssembly/binaryen#8562). Apply it to the non-debug RTS variants in installPhase, yielding ~0.2% instruction count reductions in GC-heavy benchmarks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kripken · 2026-04-01T16:31:52Z

Interesting. I worry this is not always faster, though: AND usually has a cost of 1, while TZCNT often has 2: https://www.agner.org/optimize/instruction_tables.pdf

Perhaps check what LLVM does here? They likely reasoned about this thoroughly.

ggreif · 2026-04-05T10:22:47Z

Interesting. I worry this is not always faster, though: AND usually has a cost of 1, while TZCNT often has 2: https://www.agner.org/optimize/instruction_tables.pdf

Perhaps check what LLVM does here? They likely reasoned about this thoroughly.

I have answered a similar question here.

The i32.ctz approach is also semantically cleaner: it captures the "is LSB set?" intent more directly than and 1; eqz.

kripken · 2026-04-06T14:56:02Z

I agree it might be cleaner in a way. I also agree that VMs could alter what they emit, as you wrote in the linked issue. However, if this would regress performance on major VMs right now, we'd want to wait for them to fix that before landing anything.

MaxGraey · 2026-04-07T16:19:02Z

Even if JIT compilers start optimizing similarly to wasmtime, it still won’t solve the performance issue, for example, in runtimes with interpreters (some smart contracts, embedded oriented like wasm3 and etc). If such optimization is to be done at all, in my opinion, it should only be for “optimized for size” (-Os).

ggreif · 2026-04-07T21:37:37Z

Even if JIT compilers start optimizing similarly to wasmtime, it still won’t solve the performance issue, for example, in runtimes with interpreters (some smart contracts, embedded oriented like wasm3 and etc). If such optimization is to be done at all, in my opinion, it should only be for “optimized for size” (-Os).

That went through my thoughts too. I'll submit a revision soon.

ggreif requested a review from a team as a code owner April 1, 2026 09:39

ggreif requested review from tlively and removed request for a team April 1, 2026 09:39

ggreif changed the title ~~perf(OptimizeInstructions): fold i32.and X 1; if T E into i32.ctz X; if E T~~ perf: fold LSB-test i32.and X 1 into i32.ctz in boolean contexts Apr 1, 2026

ggreif mentioned this pull request Apr 1, 2026

nix: apply patched wasm-opt (LSB-test → ctz) to RTS wasm files caffeinelabs/motoko#5967

Draft

chore: remove nix files (not for upstream)

575cd27

ggreif mentioned this pull request Apr 1, 2026

optimise LSBit mask (followed by branch) to use ctz #5752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fold LSB-test `i32.and X 1` into `i32.ctz` in boolean contexts#8562

perf: fold LSB-test `i32.and X 1` into `i32.ctz` in boolean contexts#8562
ggreif wants to merge 3 commits intoWebAssembly:mainfrom
ggreif:gabor/lsb-if-ctz

ggreif commented Apr 1, 2026 •

edited

Loading

Uh oh!

kripken commented Apr 1, 2026

Uh oh!

ggreif commented Apr 5, 2026

Uh oh!

kripken commented Apr 6, 2026

Uh oh!

MaxGraey commented Apr 7, 2026

Uh oh!

ggreif commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggreif commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

kripken commented Apr 1, 2026

Uh oh!

ggreif commented Apr 5, 2026

Uh oh!

kripken commented Apr 6, 2026

Uh oh!

MaxGraey commented Apr 7, 2026

Uh oh!

ggreif commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggreif commented Apr 1, 2026 •

edited

Loading