AVRO-4230: [Perl] Added a sort of the keys of the data in encode_map#3650
AVRO-4230: [Perl] Added a sort of the keys of the data in encode_map#3650tjmac1200 wants to merge 3 commits intoapache:mainfrom
Conversation
Without this sort before encoding into binary, we get non-deterministic output, which makes having expected results for tests more difficult.
|
Can anyone help me here? I'm new to the GitHub contribution world. I see some of the checks were not successful. The errors seemed to be related to setting up Perl as opposed to my code changes. I'm not sure what went wrong, nor how to proceed from here. The tests passed with prove in my Codespace before I submitted the pull request. Help?!? |
|
Looks like this may be our issue. And perhaps it's fixed or nearly fixed? |
|
The issue with the Github action is not fixed yet. Alternatively we could try with https://github.com/perl-actions/perl-versions |
martin-g
left a comment
There was a problem hiding this comment.
It would be good to add some new unit tests that verify the change.
It seems the current unit tests do not depend on sorting and everything works with and without the proposed change.
|
@martin-g - Your observation is accurate. I expected the tests to pass both with and without the sort. There's nothing inherently wrong with the keys not being sorted other than non-deterministic output from the function. I'll look into writing a new test that verifies the sort worked as expected. Stay tuned... Thanks! |
|
@martin-g - I've added the requested test. The new test fails non-deterministically fails on the original code. It occasionally passes the new test when Perl's keys function happens to return the keys in sorted order.. With the code change to add the sort, the new test always passes. Interestingly, there weren't any tests that executed the encode_map sub in 02_bin_encode.t. The encode_map sub was executed in 03_bin_decode.t in order to set up for the decoding tests. Hopefully this meets your mark for merging and deploying the change. |
|
@jjatria Do you want to take a look ? |
martin-g
left a comment
There was a problem hiding this comment.
AFAIK no other Avro SDK does this - sort the map keys.
But the Avro specification does not say it is against the rules either - https://avro.apache.org/docs/1.12.0/specification/#complex-types-1.
One possible issue this change could lead to is to break someone's tests that assert the byte array produced by binary encoding.
The only safe way I see is to use some kind of configuration (e.g. env vars, system properties, ...). But I don't know whether the Perl SDK has such for other needs already.
|
Hm, this seems a little like an antipattern to me. I sympathise with the desire to keep the output deterministic, but I imagine that in this context, that loses to efficiency: even though we want it to be predictable, we want it more for it to be fast. Like @martin-g says, the spec does not say anything about key ordering. Which means that while it doesn't disallow it, it also does not guarantee it. But with this change, the Perl SDK would guarantee it. I worry that merging this would make the Perl SDK diverge and from the spec. If we enforce a given order (with a test, even!) then we are, for better or worse, making it part of the documented, user-facing behaviour. We are telling our users they can rely on it (which means removing it is now a breaking change). I guess I'd flip this around and ask the following of @tjmac1200: what kind of test are you writing that require the order of the keys to be predictable? And would making the tests easier to write justify making every other use of Avro slower? |
Without this sort before encoding into binary, we get non-deterministic output, which makes having expected results for tests more difficult.
What is the purpose of the change
Verifying this change
Documentation