GH-46600: [C++][CI] Add job with ARROW_LARGE_MEMORY_TESTS enabled by raulcd · Pull Request #49490 · apache/arrow

raulcd · 2026-03-10T15:37:46Z

TBD

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

GitHub Issue: [C++][CI] Have a job with ARROW_LARGE_MEMORY_TESTS enabled #46600

github-actions · 2026-03-10T15:38:56Z

⚠️ GitHub issue #46600 has been automatically assigned in GitHub to PR creator.

raulcd · 2026-03-10T15:55:47Z

@rok I tried with /spot=capacity-optimized, without anything and also forcing /spot=false all seem to fail due to quota:

Error: Failed to launch runner: failed to create fleet: [{"ErrorCode":"VcpuLimitExceeded","ErrorMessage":"You have requested more vCPU capacity than your current vCPU limit of 0 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit."

Should we request an increase of quota for x8i.xlarge?

rok · 2026-03-10T15:58:04Z

Let me take a look.

rok · 2026-03-10T16:06:21Z

Requested 8 vcpus for spot and 8 for on-demand.

raulcd · 2026-03-10T16:17:20Z

Requested 8 vcpus for spot and 8 for on-demand.

nice! Thanks @rok

rok · 2026-03-10T17:48:00Z

Requests were approved. Change usually needs some 10 minutes to propagate.

raulcd · 2026-03-11T10:42:44Z

I tried with both 64GB and 128GB machines to validate that it wasn't a RAM issue. There are a couple of test failures due to timeout:

	 96 - parquet-arrow-reader-writer-test (Timeout)
	106 - gandiva-projector-test (Timeout)

And one due to what seems like a bug on Parquet (WriteLargeDictEncodedPage):

 [ RUN      ] TestColumnWriter.WriteLargeDictEncodedPage
/arrow/cpp/src/parquet/column_writer_test.cc:1100: Failure
Expected equality of these values:
  page_count
    Which is: 7501
  2
[  FAILED  ] TestColumnWriter.WriteLargeDictEncodedPage (19975 ms)

I'll see if I can reproduce the timeouts locally.

raulcd · 2026-03-11T13:40:16Z

I have 64GB of RAM locally.
The TestHugeProjector.SimpleTestSumHuge from gandiva-projector-test takes more than 15 minutes locally for me when running with a Debug build.
When running on release build it takes ~3 minutes but is failing, see:

[----------] 1 test from TestHugeFilter
[ RUN      ] TestHugeFilter.TestSimpleHugeFilter
/home/raulcd/code/arrow/cpp/src/gandiva/tests/huge_table_test.cc:157: Failure
Value of: (exp)->Equals(selection_vector->ToArray(), arrow::EqualOptions().nans_equal(true))
  Actual: false
Expected: true
expected array: [
  4,
  5,
  9,
  11,
  12,
  13,
  19,
  21,
  25,
  26,
  ...
  2147483625,
  2147483627,
  2147483629,
  2147483630,
  2147483636,
  2147483637,
  2147483641,
  2147483643,
  2147483644,
  2147483645
] actual array: [
  0,
  1,
  2,
  3,
  6,
  7,
  8,
  10,
  14,
  15,
  ...
  2147483634,
  2147483635,
  2147483638,
  2147483639,
  2147483640,
  2147483642,
  2147483646,
  2147483647,
  2147483648,
  2147483649
]

[  FAILED  ] TestHugeFilter.TestSimpleHugeFilter (153849 ms)
[----------] 1 test from TestHugeFilter (153850 ms total)

For the parquet-arrow-reader-writer-test the problem is with TestArrowReaderAdHoc.LargeStringColumn, when running locally it takes on a release build ~10 minutes (haven't tested on debug but I am not sure I want to :P)

[ RUN      ] TestArrowReaderAdHoc.LargeStringColumn
[       OK ] TestArrowReaderAdHoc.LargeStringColumn (602823 ms)

The parquet-writer-test TestColumnWriter.WriteLargeDictEncodedPage and TestColumnWriter.ThrowsOnDictIndicesTooLarge also fail locally for me:

[ RUN      ] TestColumnWriter.WriteLargeDictEncodedPage
/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc:1100: Failure
Expected equality of these values:
  page_count
    Which is: 7501
  2

[  FAILED  ] TestColumnWriter.WriteLargeDictEncodedPage (2190 ms)
[ RUN      ] TestColumnWriter.ThrowsOnDictIndicesTooLarge
/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc:1147: Failure
Expected: try { ([&]() { file_writer->Close(); })(); } catch (const ParquetException& err) { switch (0) case 0: default: if (const ::testing::AssertionResult gtest_ar = (::testing::internal::MakePredicateFormatterFromMatcher((::testing::Property(&ParquetException::what, ::testing::HasSubstr("exceeds maximum int value"))))("err", err))) ; else ::testing::internal::AssertHelper(::testing::TestPartResult::kNonFatalFailure, "/home/raulcd/code/arrow/cpp/src/parquet/column_writer_test.cc", 1147, gtest_ar.failure_message()) = ::testing::Message(); throw; } throws an exception of type ParquetException.
  Actual: it throws nothing.

[  FAILED  ] TestColumnWriter.ThrowsOnDictIndicesTooLarge (23736 ms)

My takes from this. We can enable a job that test the memory large tests, currently there seems to be some bugs on them, both for Gandiva and Parquet. We probably want to run on CI with a release build, in order to shorten execution time but even with that we will require like a 15 minutes timeout on individual tests.
@pitrou what are your thoughts?

Should I open individual issues for those tests?

rok · 2026-03-11T13:56:15Z

+1 to increasing timeouts and including them either into extras and/or release.

raulcd · 2026-03-11T15:39:22Z

It seems to require a really long timeout:

 [  FAILED  ] 1 test, listed below:
[  FAILED  ] TestHugeFilter.TestSimpleHugeFilter
 1 FAILED TEST
/build/cpp/src/gandiva/tests
107/107 Test  #96: parquet-arrow-reader-writer-test .............***Timeout 1200.10 sec
Running parquet-arrow-reader-writer-test, redirecting output into /build/cpp/build/test-logs/parquet-arrow-reader-writer-test.txt (attempt 1/1)
        Start  96: parquet-arrow-reader-writer-test

pitrou · 2026-03-11T15:56:38Z

We want to test in debug mode to keep all runtime checks, assertions etc., activated, but we can enable some optimizations, see [C++][CI] Have a job with ARROW_LARGE_MEMORY_TESTS enabled #46600 (comment)
We can just disable Gandiva, it's not really maintained anyway

…NG_LEVEL=PRODUCTION

apacheGH-46600: [C++][CI] Add job with ARROW_LARGE_MEMORY_TESTS enabled

82d6fb8

github-actions bot added CI: Extra: C++ Run extra C++ CI awaiting committer review Awaiting committer review labels Mar 10, 2026

raulcd added 2 commits March 10, 2026 16:44

Try removing /spot=capacity-optimized to see if quota issues are solved

d211b5d

Force spot=false

342f89a

Try with 2xlarge to see if tests finish without timing out

f764dbd

raulcd added 5 commits March 11, 2026 15:08

Try setting build type to Release and ctest timeout to 15 minutes

1ae6066

Try single line just to validate

e37a511

That worked but what about removing the backslash

47217b7

Increase timeout even more

8b9fd88

Add an absurdely high timeout

c8cb4c4

Use ARROW_C_FLAGS_DEBUG and ARROW_CXX_FLAGS_DEBUG -O1 and BUILD_WARNI…

4278344

…NG_LEVEL=PRODUCTION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-46600: [C++][CI] Add job with ARROW_LARGE_MEMORY_TESTS enabled#49490

GH-46600: [C++][CI] Add job with ARROW_LARGE_MEMORY_TESTS enabled#49490
raulcd wants to merge 10 commits intoapache:mainfrom
raulcd:GH-46600

raulcd commented Mar 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

raulcd commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

raulcd commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

raulcd commented Mar 11, 2026

Uh oh!

raulcd commented Mar 11, 2026 •

edited

Loading

Uh oh!

rok commented Mar 11, 2026

Uh oh!

raulcd commented Mar 11, 2026

Uh oh!

pitrou commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

raulcd commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TBD

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

raulcd commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

raulcd commented Mar 10, 2026

Uh oh!

rok commented Mar 10, 2026

Uh oh!

raulcd commented Mar 11, 2026

Uh oh!

raulcd commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rok commented Mar 11, 2026

Uh oh!

raulcd commented Mar 11, 2026

Uh oh!

pitrou commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raulcd commented Mar 10, 2026 •

edited by github-actions bot

Loading

raulcd commented Mar 11, 2026 •

edited

Loading