Skip to content

AVRO-4230: [Perl] Added a sort of the keys of the data in encode_map#3650

Open
tjmac1200 wants to merge 3 commits intoapache:mainfrom
tjmac1200:main
Open

AVRO-4230: [Perl] Added a sort of the keys of the data in encode_map#3650
tjmac1200 wants to merge 3 commits intoapache:mainfrom
tjmac1200:main

Conversation

@tjmac1200
Copy link

Without this sort before encoding into binary, we get non-deterministic output, which makes having expected results for tests more difficult.

What is the purpose of the change

  • Sort keys of data hash to provide deterministic output for AVRO-4230.

Verifying this change

  • This change is a trivial rework / code cleanup without any test coverage.

Documentation

  • Does this pull request introduce a new feature? no

Without this sort before encoding into binary, we get non-deterministic output, which makes having expected results for tests more difficult.
@github-actions github-actions bot added the Perl label Feb 15, 2026
@tjmac1200
Copy link
Author

Can anyone help me here? I'm new to the GitHub contribution world. I see some of the checks were not successful. The errors seemed to be related to setting up Perl as opposed to my code changes. I'm not sure what went wrong, nor how to proceed from here. The tests passed with prove in my Codespace before I submitted the pull request. Help?!?

@tjmac1200
Copy link
Author

Looks like this may be our issue.  And perhaps it's fixed or nearly fixed?
shogo82148/actions-setup-perl#2481

@martin-g
Copy link
Member

The issue with the Github action is not fixed yet.
It seems the author has an idea what the problem is, so hopefully it will be resolved soon.

Alternatively we could try with https://github.com/perl-actions/perl-versions

@martin-g martin-g closed this Feb 16, 2026
@martin-g martin-g reopened this Feb 16, 2026
Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add some new unit tests that verify the change.
It seems the current unit tests do not depend on sorting and everything works with and without the proposed change.

@tjmac1200
Copy link
Author

@martin-g - Your observation is accurate. I expected the tests to pass both with and without the sort. There's nothing inherently wrong with the keys not being sorted other than non-deterministic output from the function. I'll look into writing a new test that verifies the sort worked as expected. Stay tuned... Thanks!

@tjmac1200
Copy link
Author

@martin-g - I've added the requested test. The new test fails non-deterministically fails on the original code. It occasionally passes the new test when Perl's keys function happens to return the keys in sorted order.. With the code change to add the sort, the new test always passes.

Interestingly, there weren't any tests that executed the encode_map sub in 02_bin_encode.t. The encode_map sub was executed in 03_bin_decode.t in order to set up for the decoding tests.

Hopefully this meets your mark for merging and deploying the change.

@martin-g
Copy link
Member

@jjatria Do you want to take a look ?

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK no other Avro SDK does this - sort the map keys.
But the Avro specification does not say it is against the rules either - https://avro.apache.org/docs/1.12.0/specification/#complex-types-1.

One possible issue this change could lead to is to break someone's tests that assert the byte array produced by binary encoding.

The only safe way I see is to use some kind of configuration (e.g. env vars, system properties, ...). But I don't know whether the Perl SDK has such for other needs already.

@jjatria
Copy link
Contributor

jjatria commented Feb 17, 2026

Hm, this seems a little like an antipattern to me. I sympathise with the desire to keep the output deterministic, but I imagine that in this context, that loses to efficiency: even though we want it to be predictable, we want it more for it to be fast.

Like @martin-g says, the spec does not say anything about key ordering. Which means that while it doesn't disallow it, it also does not guarantee it. But with this change, the Perl SDK would guarantee it. I worry that merging this would make the Perl SDK diverge and from the spec.

If we enforce a given order (with a test, even!) then we are, for better or worse, making it part of the documented, user-facing behaviour. We are telling our users they can rely on it (which means removing it is now a breaking change).

I guess I'd flip this around and ask the following of @tjmac1200: what kind of test are you writing that require the order of the keys to be predictable? And would making the tests easier to write justify making every other use of Avro slower?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants