split: stream data using copy to not overwhelm memory with large files#10251
split: stream data using copy to not overwhelm memory with large files#10251ChrisDryden wants to merge 4 commits intouutils:mainfrom
Conversation
Merging this PR will improve performance by ×4.9
Performance Changes
Comparing Footnotes
|
|
@ChrisDryden what is your take on the perf regression ? |
|
I don't think the perf regression is acceptable, right now io::copy uses a 8kb buffer which is causing the regression. I see head uses a 64kb buffer instead, I'll try seeing if the increased buffer size will mitigate the perf regression |
|
GNU testsuite comparison: |
|
Much better now, solved the original issue and there's huge memory and performance improvements. The original fix had a 8kb default buffer and testing locally I found it hit diminishing returns at 128kb. |
|
GNU testsuite comparison: |
7d5b0c0 to
3d2d0ba
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
This should solve the issue: #10250 where for large files split can run out of memory. I was able to use the built in integration test support to limit the resources that the test has to mock a scenario where the file is larger than the available memory to trigger the conditions for the bug.