Reindexer: Mock reindex function to reduce parallel contention#1210
Open
Reindexer: Mock reindex function to reduce parallel contention#1210
Conversation
This one's aimed at fixing an intermittently failing test case: https://github.com/riverqueue/river/actions/runs/24322049420/job/71009998068?pr=1208 --- FAIL: TestReindexer (0.00s) --- FAIL: TestReindexer/ReindexesConfiguredIndexes (10.07s) reindexer_test.go:219: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_04" [user facing: "maintenance_2026_04_13t01_54_14_schema_04"] after cleaning in 26.436456ms [4 generated] [7 reused] test_signal.go:95: timed out waiting on test signal after 10s logger.go:256: time=2026-04-13T01:54:27.581Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts" riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_04"; 1 idle schema(s) [4 generated] [10 reused] --- FAIL: TestReindexer/ReindexesMinimalSubsetofIndexes (10.14s) reindexer_test.go:183: Reusing idle postgres schema "maintenance_2026_04_13t01_54_14_schema_01" [user facing: "maintenance_2026_04_13t01_54_14_schema_01"] after cleaning in 28.042877ms [4 generated] [10 reused] test_signal.go:95: timed out waiting on test signal after 10s reindexer_test.go:211: Error Trace: /home/runner/work/river/river/internal/maintenance/reindexer_test.go:211 Error: Should be false Test: TestReindexer/ReindexesMinimalSubsetofIndexes logger.go:256: time=2026-04-13T01:54:28.444Z level=INFO msg="maintenance.Reindexer: Signaled to stop during index build; attempting to clean up concurrent artifacts" riverdbtest.go:293: Checked in postgres schema "maintenance_2026_04_13t01_54_14_schema_01"; 1 idle schema(s) [5 generated] [24 reused] FAIL FAIL github.com/riverqueue/river/internal/maintenance 18.764s I'm diagnosing with Claude's help here, but what appears to be happening is that although a reindex operation in Postgres is often fast, it is still a heavy operation, and can slow down even further when there's a lot of concurrent activity hammering a database. Many reindexer test cases run in parallel, and it appears that was happening here is that we got a reindex that exceeded our maximum timeout of 10x. We have some evidence this during the test run from the runtime of 10.07s and the line: Signaled to stop during index build; attempting to clean up concurrent artifacts Here, I'm putting forward a solution proposed by Claude, which is to mock out the reindex operation, especially where we have a number of reindexes going in parallel. The tests `ReindexOneSuccess` and `ReindexSkippedWithReindexArtifact` still put load on the real reindex operation, so we're not going full mock here, and should see our intermittency considerably reduced while still being confident that everything still works.
74790ee to
9bb0e45
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This one's aimed at fixing an intermittently failing test case:
https://github.com/riverqueue/river/actions/runs/24322049420/job/71009998068?pr=1208
I'm diagnosing with Claude's help here, but what appears to be happening
is that although a reindex operation in Postgres is often fast, it is
still a heavy operation, and can slow down even further when there's a
lot of concurrent activity hammering a database.
Many reindexer test cases run in parallel, and it appears that was
happening here is that we got a reindex that exceeded our maximum
timeout of 10x. We have some evidence this during the test run from the
runtime of 10.07s and the line:
Here, I'm putting forward a solution proposed by Claude, which is to
mock out the reindex operation, especially where we have a number of
reindexes going in parallel. The tests
ReindexOneSuccessandReindexSkippedWithReindexArtifactstill put load on the real reindexoperation, so we're not going full mock here, and should see our
intermittency considerably reduced while still being confident that
everything still works.