Skip to content

[SS-103] Fix COPY FROM PARQUET unsorted map keys#36032

Open
def- wants to merge 3 commits intoMaterializeInc:mainfrom
def-:pr-unsorted-parquet
Open

[SS-103] Fix COPY FROM PARQUET unsorted map keys#36032
def- wants to merge 3 commits intoMaterializeInc:mainfrom
def-:pr-unsorted-parquet

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented Apr 13, 2026

Fixes https://github.com/MaterializeInc/database-issues/issues/11290
Updates ColReader::Map logic to sort keys when reading in data.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@patrickwwbutler
Copy link
Copy Markdown
Contributor

Had to update the checks in the test you created - it appears MZ doesn't support the equality operator for maps (at least string maps): ERROR: operator does not exist: map[text=>text] = map[text=>text]

@patrickwwbutler patrickwwbutler changed the title DNM: Test for parquet unsorted map [SS-103] Fix COPY FROM PARQUET unsorted map keys Apr 14, 2026
@patrickwwbutler patrickwwbutler self-assigned this Apr 14, 2026
@patrickwwbutler patrickwwbutler requested a review from a team April 14, 2026 14:38
@patrickwwbutler patrickwwbutler marked this pull request as ready for review April 14, 2026 14:38
@patrickwwbutler patrickwwbutler requested a review from a team as a code owner April 14, 2026 14:38
Copy link
Copy Markdown
Contributor

@patrickwwbutler patrickwwbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels a bit suspicious reviewing my own changes so I won't merge, but I'm fairly confident this is good to go - also tested without the sort changes and the TD fails as expected

Copy link
Copy Markdown
Contributor Author

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks reasonable to me!

Copy link
Copy Markdown
Contributor

@martykulma martykulma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The reasoning for the sort needs to be documented in the source, but lgtm otherwise.

🦈


# Regression test for BUG #41: COPY FROM Parquet Map keys not sorted.
#
# Materialize's Datum::Map requires keys to be in ascending sorted order.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great clarifying comment about the sort - definitely ought to be in the source!

Copy link
Copy Markdown
Contributor

@ublubu ublubu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It'd be nice to (also) have it on the doc comment for Datum::Map (and/or push_dict_with?) if that type is never supposed to be constructed from unsorted values.

Copy link
Copy Markdown
Contributor

@ublubu ublubu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's already on push_dict_with:

/// The closure **must** push keys in ascending order, otherwise equality
/// checks on the resulting `Row` may be wrong and reading the dict IN DEBUG
/// MODE will cause a panic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be even happier if we can get rid of that foot gun. Seems way too easy to use wrongly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants