Unmarshalling without type info prefers int64.#52
Conversation
Similar changes were commit 36d31e7; but there, we used 'int', with the reasoning "Plainer is better." I've since received complaints about this: that on 32-bit architectures this will result in different behavior than on 64-bit, and that it may be lossy on 32-bit. Since the only downside of using int64, as far as I know, is that it takes a few more characters on the source side -- 'int' is already uses 64-bits of space on a 64-bit architecture anyway -- I guess I'll concede that it might as well be the default. As before, it's important to address this at all because otherwise we would experience some odd behaviors in that the CBOR unmarshal and JSON unmarshal would behave differently when targeting an empty interface: JSON would yield int64, while CBOR would yield *uint64*, due to CBOR's Interesting (ahem) Opinions about signedness of numbers. And as before, remember: if unmarshalling into a concrete type that uses some other numeric type like `uint64`, etc, those unmarshallings do not pass through this cast and are not affected by this change.
|
Note: unless I'm gravely mistaken, Therefore: it seems to me that the refmt library having a default behavior of producing (signed) If anyone has objections to this proposed behavior or sees flaws with this understanding of numeric precision bits, speak up. Ideally with a PR that handles this better. (Frankly, I don't handle the number 18,446,744,073,709,551,615 very often, so, while I'm trying to do right by whoever does, it's awfully hard to write code that one doesn't oneself have a usecase for or see exercised in the wild: I'm counting on someone with practical notes to chime in if I'm missing a big clue here.) |
This diff changes the
objpackage's use ofinttoint64. Usingintis problematic because if one is building for 32-bit architectures as well as 64-bit, usingintwill result in different behavior when large numbers are handled on 32-bit architectures (they'll overflow when they would've been fine on a 64-bit build).Since the only downside of using int64, as far as I can figure, is that it takes a few more characters on the source side -- 'int' is already uses 64-bits of space on a 64-bit architecture anyway -- I guess I'll concede that it might as well be the default. The additional explicitness ensures we get the same behavior on 64-bit and 32-bit builds.
Users might find it a bit odd that unmarshalling into an
interface{}may result in amap[string]interface{}that's concretely populated with a bunch ofint64values, but it seems there's little choice. Making a choice betweenintandint64configurable adds a lot of complexity (and probably a performance hit, if a minor one) for very little gain.For comparison, it's worth noting that stdlib's
json.Unmarshalbehavior around numbers is much more fun than this: you getfloat64by default (which does not losslessly convey the same range asint64, to my understanding)... unless you engage withDecoder.UseNumber, which is a whole other ball of very interesting wax. (I haven't actually ever seen theUseNumbersystem in use, so I'm not sure what to think of it; and either way, I'm defaulting to doubtful that its semantics would translate cleanly to a system that also handles CBOR.) So all in all, I'm not sure we get much useful guidance from observing that package.As discussed previously in commit 36d31e7 (which started the int consistency work, but used 'int', at the time with the reasoning "plainer is better"): it's important to address both integer size and signedness because otherwise we would experience some odd behaviors in that the CBOR unmarshal and JSON unmarshal would behave differently when targeting an empty interface: JSON would yield int64, while CBOR would yield uint64, due to CBOR's Interesting (ahem) Opinions about signedness of numbers. This would make consistency test suites for the
objpackage having consistency behavior across codecs fail. This is why theobjpackage must ignore whether the token contains an int or a uint in order to have reasonable behavior.This change only impacts situations when the
objpackage is working without any significant type information (as when filling aninterface{}ormap[string]interface{}, etc). If unmarshalling into a concrete type that uses some other numeric type likeuint64, etc, those unmarshallings do not pass through this cast and are not affected by this change.