Skip to content

fix: enforce SQS 1 MB batch byte limit#845

Merged
alexluong merged 1 commit intomainfrom
fix/sqs-batch-byte-limit
Apr 21, 2026
Merged

fix: enforce SQS 1 MB batch byte limit#845
alexluong merged 1 commit intomainfrom
fix/sqs-batch-byte-limit

Conversation

@alexluong
Copy link
Copy Markdown
Collaborator

Summary

gocloud.dev's SQS topic driver batches Send() calls via SendMessageBatch (up to 10 messages per batch) but does not enforce SQS's 1 MB total batch size limit — it relies on SQS to reject oversized batches at the API level.

When log entries carry large payloads (e.g., large request/response bodies), the combined batch can exceed 1 MB, causing SQS to reject the entire batch with BatchRequestTooLong. This failure cascades: the log entry never gets persisted, and the retry scheduler can't find a prior attempt in the logstore, triggering an infinite retry loop (see #663).

This PR sets MaxBatchByteSize: 1_048_576 on the gocloud TopicOptions so the batcher splits oversized batches before they reach SQS. If a single message exceeds 1 MB, gocloud returns ErrMessageTooLarge instead of a cryptic SQS batch error.

Changes

  • Pass TopicOptions with BatcherOptions.MaxBatchByteSize set to 1 MB when opening SQS topics in queue_awssqs.go

Test plan

  • Verified compilation
  • Deploy to staging with SQS backend and send events with large payloads (>100 KB) at high concurrency to confirm batches are split correctly

🤖 Generated with Claude Code

gocloud.dev's SQS driver batches Send() calls internally via
SendMessageBatch (up to 10 messages) but does not enforce SQS's 1 MB
batch size limit. When log entries carry large payloads the batch can
exceed 1 MB, causing SQS to reject the entire batch. This cascades
into a permanent retry loop because the log entry never gets persisted
and the retry scheduler cannot find a prior attempt.

Set MaxBatchByteSize to 1 MB so gocloud splits oversized batches
before they reach SQS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexluong alexluong force-pushed the fix/sqs-batch-byte-limit branch from 971064b to ebe2e42 Compare April 20, 2026 13:49
@alexluong alexluong mentioned this pull request Apr 20, 2026
)

// sqsMaxBatchBytes is the maximum total payload size for an SQS SendMessageBatch
// request. SQS rejects batches exceeding 256 KB per message and 1 MB per batch.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Based on the docs,

The minimum message size is 1 byte (1 character). The maximum is 1,048,576 bytes (1 MiB).

1 single message can also be of size 1MB now. So both single message or a batch of 10 messages should be max 1MB

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @samadalishah, thanks for checking in and reviewing the PR!

I think 1MB is a natural limitation of SQS, so if there's a single message exceeding 1MB then that message will not be able to be sent to SQS. I'm not sure if there's anything we can do in this regards here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all good here! 👍
I think this should be considered for issue #663

As we are adding Event, Attempt, Destination in the LogEntry, the attempt can sometimes be huge (based on my error example, the webhook destination in case of failure responded with 13K lines of HTML 😅). So either the Attempt's ResponseData should be truncated or the message should be cleaned from deliverymq-retry but again related to the issue not this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, and definitely something we can consider supporting better. You brought up a good point that Attempt's ResponseData is something we need to consider in terms of size limit. cc @alexbouchardd

@alexluong alexluong merged commit bdaeadc into main Apr 21, 2026
10 checks passed
@alexluong alexluong deleted the fix/sqs-batch-byte-limit branch April 21, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants