Skip to content

Stream local file content to File Fields#321

Merged
annehaley merged 3 commits intomasterfrom
file-streams
Mar 12, 2026
Merged

Stream local file content to File Fields#321
annehaley merged 3 commits intomasterfrom
file-streams

Conversation

@annehaley
Copy link
Collaborator

This PR replaces instances of ContentFile(f.read()) with File(f) so that local file content can be streamed to file fields (rather than entirely loaded into memory and dumped). I tested this branch on the production EC2 worker to run the conversion task on the large "boston orthoimagery" dataset. On master, that task crashes with a memory error. On this branch, the task succeeds.

@annehaley annehaley requested a review from BryonLewis March 11, 2026 14:02
Copy link
Collaborator

@brianhelba brianhelba left a comment

Copy link
Collaborator

@brianhelba brianhelba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level perspective, this is definitely more correct.

At a low level, I'm not 100% certain whether this is doing true streaming, or just making half as many in-memory copies. Note that Boto3 upload_fileobj is passed a ReadBytesWrapper. The ReadBytesWrapper.read method passes its parameters through to the underlying file descriptor, so if Boto3 calls .read(size=...) to read only a limited amount of bytes at a time, then size will be passed through to the underlying Python file descriptor and will use only limited amounts of memory.

However, Boto3 is such a mess internally that I can't verify it does this; we could run it with a debug breakpoint to see, which I haven't yet done.

Also note, the File constructor will also read the .name property from the parameter, whereas ContentFile has no way to know this (as it only receives a bytes), so the files will now know their filename. Whether or how the filename is used when determining the storage path is up to the FileField.upload_to.

Regardless, this is a clear correctness improvement.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 12, 2026

Deploying geodatalytics with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6d8ed37
Status: ✅  Deploy successful!
Preview URL: https://3c38e8da.geodatalytics.pages.dev
Branch Preview URL: https://file-streams.geodatalytics.pages.dev

View logs

@annehaley annehaley removed the request for review from BryonLewis March 12, 2026 14:27
@annehaley annehaley merged commit ab24dd2 into master Mar 12, 2026
3 checks passed
@annehaley annehaley deleted the file-streams branch March 12, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants