cat: avoid unnecessary allocation by oech3 · Pull Request #11675 · uutils/coreutils

oech3 · 2026-04-06T07:26:34Z

Allocate buffer on heap instead of stack for read()/write() show-path which is unnecessary if splice() fast-path succeed.

$ echo 1 > /tmp/1
> taskset -c 0 hyperfine -N --runs 10000 "/tmp/coreutils/target/release/cat-stack /tmp/1" "target/release/cat-heap /tmp/1"
Benchmark 1: /tmp/coreutils/target/release/cat-stack /tmp/1
  Time (mean ± σ):     921.2 µs ±  84.4 µs    [User: 372.9 µs, System: 443.7 µs]
  Range (min … max):   843.0 µs … 3926.9 µs    10000 runs
Benchmark 2: target/release/cat-heap /tmp/1
  Time (mean ± σ):     908.6 µs ± 117.0 µs    [User: 380.6 µs, System: 424.1 µs]
  Range (min … max):   821.4 µs … 4337.6 µs    10000 runs 
Summary
  target/release/cat-heap /tmp/1 ran
    1.01 ± 0.16 times faster than /tmp/coreutils/target/release/cat-stack /tmp/1

related #10832

github-actions · 2026-04-06T07:33:47Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/basenc/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/dd/no-allocate is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/cut/bounded-memory is now passing!

github-actions · 2026-04-06T08:16:09Z

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/resolution (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/cut/cut-huge-range is now passing!

oech3 · 2026-04-06T08:28:02Z

hyperfine is flakey

xtqqczze · 2026-04-06T12:29:42Z

Switching from a stack allocation to a heap allocation doesn’t avoid allocation...

oech3 · 2026-04-06T12:38:05Z

buf is declared after splice code.

oech3 · 2026-04-06T12:50:57Z

I saw more perf difference with 1024 * 1024 by switching to vec. So I think vec's allocation is deffered.

github-actions · 2026-04-08T04:27:45Z

GNU testsuite comparison:

Skipping an intermittent issue tests/cut/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/rm/many-dir-entries-vs-OOM is now being skipped but was previously passing.

github-actions · 2026-04-08T17:31:33Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/cut/cut-huge-range is now passing!

oech3 · 2026-04-12T14:39:22Z

We might use nightly fill_buf in the future to avoid 0-fill at here.

xtqqczze · 2026-04-12T15:22:32Z

We might use nightly fill_buf in the future to avoid 0-fill at here.

Presumably you mean nightly-only Read::read_buf. Might be worth prototyping an implementation to validate this approach.

xtqqczze · 2026-04-19T17:18:39Z

1.01 ± 0.16 times faster

This doesn’t appear to be a statistically significant improvement; the reported uncertainty is large enough that the result is consistent with both a slowdown and a speedup.

oech3 · 2026-04-19T17:26:10Z

When I manually changed it with large MiB, it causes stack overflow without vec! . So I think Linux is saving RAM usage at least for.

(but we should avoid N MiB pipe usage for small input)

xtqqczze · 2026-04-19T18:14:10Z

When I manually changed it with large MiB

But we’re talking about the 64 KiB stack allocation here.

oech3 · 2026-04-19T18:16:34Z

Linux can still save 64KiB

xtqqczze · 2026-04-19T18:27:29Z

The stack space is already reserved, so switching to a heap allocation actually increases overall memory usage, at least in theory.

oech3 · 2026-04-19T18:30:23Z

If splice() fast-path succeed, cat does not take code path allocating buf.

oech3 · 2026-04-19T18:34:44Z

This is impossible to test on macOS, but changing buf to large stack causes serious perf drop while vec does not when splice() succeed. So allocation is omitted on Linux.

xtqqczze · 2026-04-19T18:45:16Z

This PR introduced a heap allocation on Linux where there wasn’t one previously. Based on the data in the description, there is no statistically significant improvement. Using a significantly larger stack array would risk stack overflow and violate clippy::large_stack_arrays.

oech3 · 2026-04-19T18:49:27Z

changing buf to large stack

This is just for verification for allocation bypass. I'm not intended to to do at production.

oech3 · 2026-04-19T18:51:27Z

How to actually bypass allocation completely in the case splice() fast-path succeed in your thought?

xtqqczze · 2026-04-19T19:01:09Z

Reverting the PR would avoid the unnecessary heap allocation and allocate for free using existing stack space. Your observed improvement in hyperfine is likely just noise or an artifact of LLVM optimization.

oech3 · 2026-04-19T19:03:21Z

I want to completely stop allocating it when splice() succeed. How to do that? Who guarantee "existing stack space"?

xtqqczze · 2026-04-19T19:14:43Z

There is typically 2 MiB stack already reserved per thread, see https://doc.rust-lang.org/std/thread/#stack-size. Using a fixed-size stack buffer will not introduce an additional system allocation.

oech3 · 2026-04-19T19:16:42Z

Hmm. At least, 1 MiB vec! with pure splice path was faster than 1 MiB stack clearly.

xtqqczze · 2026-04-19T19:21:51Z

1 MiB is too large for a stack array and risks stack overflow. It also violates clippy::large_stack_arrays.

oech3 · 2026-04-19T19:24:46Z

Did you see #11675 (comment) ? It is just for local verification.

If 2 MiB stack is actually free, 1 MiB stack should not drop perf. But it dropped perf.

xtqqczze · 2026-04-19T19:37:14Z

Ah, the likely reason for your performance drop is that a 1 MiB stack buffer must be zeroed at function entry. In our case we only use a 64 KiB buffer, so that overhead is negligible. If an uninitialized buffer could be used via Read::read_buf, this would not be a factor.

oech3 · 2026-04-19T19:43:14Z

I would split function containing the stack array and avoid call stack too.

xtqqczze · 2026-04-19T19:49:02Z

I guess the change made sense to avoid unnecessary zero-initialization, but the following would also have worked:

    // Use a small stack array to avoid unnecessary zero-initialization overhead when splice() was used
    #[cfg(any(target_os = "linux", target_os = "android"))]
    let mut buf = [0; 512];
    #[cfg(not(any(target_os = "linux", target_os = "android")))]
    let mut buf = [0; 1024 * 8];

oech3 · 2026-04-19T19:52:52Z

Ofcause. But I wanted to save slow-path's syscalls for the sake.

xtqqczze · 2026-04-19T19:53:49Z

I think your new approach in #11906 is much easier to understand.

oech3 marked this pull request as ready for review April 6, 2026 07:55

oech3 marked this pull request as draft April 6, 2026 08:04

oech3 force-pushed the cat-alloc branch from bac88ce to 01fa48b Compare April 6, 2026 08:04

oech3 marked this pull request as ready for review April 6, 2026 08:49

oech3 force-pushed the cat-alloc branch from 01fa48b to 0784b0a Compare April 8, 2026 04:16

cat: avoid unnecessary allocation

64288da

oech3 force-pushed the cat-alloc branch from 0784b0a to 64288da Compare April 8, 2026 17:03

sylvestre merged commit efd0f0c into uutils:main Apr 12, 2026
169 checks passed

oech3 deleted the cat-alloc branch April 12, 2026 14:38

This comment was marked as outdated.

Sign in to view

oech3 mentioned this pull request Apr 19, 2026

cat: avoid unnecessary alloc on Linux #11906

Merged

Uh oh!

Conversation

oech3 commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

oech3 commented Apr 6, 2026

Uh oh!

xtqqczze commented Apr 6, 2026

Uh oh!

oech3 commented Apr 6, 2026 via email

Uh oh!

oech3 commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Uh oh!

oech3 commented Apr 12, 2026

Uh oh!

xtqqczze commented Apr 12, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

This comment was marked as outdated.

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oech3 commented Apr 19, 2026 •

edited

Loading

oech3 commented Apr 19, 2026 •

edited

Loading

oech3 commented Apr 19, 2026 •

edited

Loading

oech3 commented Apr 19, 2026 •

edited

Loading