Skip to content

feat(JSON): jcpan -t JSON — Result: PASS (68 files, 26126 subtests)#550

Open
fglock wants to merge 18 commits intomasterfrom
feature/json-shim-compat
Open

feat(JSON): jcpan -t JSON — Result: PASS (68 files, 26126 subtests)#550
fglock wants to merge 18 commits intomasterfrom
feature/json-shim-compat

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 23, 2026

Summary

./jcpan -t JSON now reports Result: PASS on the full CPAN JSON
4.11 test suite: 68 files, 26126 subtests, 0 failures.
Baseline was 16 / 68.

Progression

Milestone Result
master 16 / 68 passing
+ JSON.pm shim surface 22 / 68
+ delegate to JSON::PP, delete Json.java / fastjson2 47 / 68
+ 4 parser/runtime/regex fixes 64 / 68
+ JSON::PP is_xs/is_pp, property() 65 / 68
+ reftype in -convert_blessed_universally 65 / 68 (+ t/e11)
+ stash-alias canonicalisation in bless / isa 66 / 68
+ JSON::backportPP support 67 / 68
+ JSON::PP IncrParser byte-safe substr Result: PASS

Parser / runtime / regex fixes (each with a unit-test regression)

  1. Package-literal barewordsbless $x, Foo::Bar::; strips the
    trailing ::; Foo::Bar:: eq $scalar is no longer mis-parsed as
    a sub call.
  2. Hashref disambiguation{"a","b"}, {1,2},
    {"foo",1,"bar",2} evaluate to anonymous hashrefs.
  3. unpack "U*" character-mode fix — reads code points directly.
  4. [\c?] in character classes — matches U+007F, not U+001C.
  5. Stash-alias canonicalisationbless and UNIVERSAL::isa
    route through GlobalVariable.resolveStashAlias so
    *Dst:: = *Src::; works transitively.

JSON module changes

  • JSON.pm@ISA = ('JSON::PP'); gates backend require on
    $ENV{PERL_JSON_BACKEND}; paper-patches @JSON::PP::ISA,
    @JSON::backportPP::ISA, is_pp/is_xs so both our and CPAN's
    shipped JSON::backportPP behave identically; full
    dispatcher-style property(), import (incl. -support_by_pp,
    -convert_blessed_universally via reftype, -no_export),
    jsonToObj/objToJson, backend().
  • New JSON/backportPP.pmrequire JSON::PP followed by
    delete $INC{'JSON/PP.pm'} to match the backend-identification
    contract.
  • JSON::PP::IncrParser::incr_parse — replaced use bytes; substr
    with explicit utf8::encode / substr / utf8::decode around the
    buffer-trimming step, since PerlOnJava's use bytes pragma does
    not yet redirect substr.
  • JSON::PP — added is_xs / is_pp class methods (the CPAN
    dispatcher normally installs them post-load).
  • Added property() covering all JSON::XS option names.
  • Removed Json.java / fastjson2 and the 13 doc references.

Test plan

  • make (full PerlOnJava unit suite) passes.
  • ./jcpan -t JSON: Result: PASS — 68 files, 26126 subtests,
    0 failures, make test exit 0
    .
  • Per-fix regression tests added to method_call_trailing_colons.t,
    hash_ref_disambiguation.t (new), unpack.t,
    regex/regex_charclass.t, stash_aliasing.t.

Design doc

dev/modules/json_test_parity.md captures the full history, the
strategic pivot from a hand-rolled Java encoder to delegating to
JSON::PP, every fix applied along the way, and the one follow-up
item (use bytes redirecting substr/unpack globally —
tracked for the runtime roadmap, not needed here).

Generated with Devin

fglock added a commit that referenced this pull request Apr 23, 2026
Catalogue the 46 remaining jcpan -t JSON failures (after PR #550) by
feature, outline a 5-phase implementation strategy of a hand-rolled
encoder/decoder in Json.java, and set up a tracking section so the
work can be resumed and PRs measured against a reproducible baseline.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock changed the title fix(JSON): shim compatibility for jcpan -t JSON feat(JSON): path to full jcpan -t JSON test parity Apr 23, 2026
@fglock fglock changed the title feat(JSON): path to full jcpan -t JSON test parity feat(JSON): 66/68 on jcpan -t JSON + 4 parser/runtime fixes Apr 23, 2026
@fglock fglock changed the title feat(JSON): 66/68 on jcpan -t JSON + 4 parser/runtime fixes feat(JSON): 0 failures on jcpan -t JSON (62 pass, 6 skip-all) Apr 23, 2026
@fglock fglock changed the title feat(JSON): 0 failures on jcpan -t JSON (62 pass, 6 skip-all) feat(JSON): jcpan -t JSON — Result: PASS (68 files, 26126 subtests) Apr 23, 2026
fglock added a commit that referenced this pull request Apr 23, 2026
Catalogue the 46 remaining jcpan -t JSON failures (after PR #550) by
feature, outline a 5-phase implementation strategy of a hand-rolled
encoder/decoder in Json.java, and set up a tracking section so the
work can be resumed and PRs measured against a reproducible baseline.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the feature/json-shim-compat branch from d78e321 to 5792481 Compare April 23, 2026 14:59
fglock and others added 17 commits April 23, 2026 19:22
Enhance the bundled JSON.pm shim so more CPAN JSON tests run against
our Java-backed backend instead of dying at import/introspection time.

Against ~/.cpan/build/JSON-4.11-9 with PERL5LIB=./blib/lib:./blib/arch
(the @inc that `make test` uses), passing tests go from 16 to 22/68.
The remaining failures need Java-side work in Json.java (relaxed mode,
croak-style error messages, option handling, incremental parser,
Boolean class, etc.) and will be addressed separately.

Changes:
- `import` now accepts `-support_by_pp`, `-no_export`,
  `-convert_blessed_universally` as no-ops (they previously aborted
  `use JSON -support_by_pp;` with an import error).
- Declare `$JSON::Backend`, `$JSON::BackendModule`,
  `$JSON::BackendModuleXS` so `JSON->backend->VERSION`,
  `JSON->backend->is_xs`, etc., work.
- Make `backend`, `is_xs`, `is_pp`, `require_xs_version`,
  `pureperl_only_methods`, `null` callable as class or instance
  methods (previously `backend` returned undef).
- Add obsolete `jsonToObj` / `objToJson` aliases with the standard
  deprecation warning.
- Accept (as no-ops) option setters `max_depth`, `max_size`,
  `indent_length`, `filter_json_object`,
  `filter_json_single_key_object`, `sort_by` so chained
  configuration calls don't die.
- Provide minimal `incr_text` / `incr_parse` / `incr_skip` /
  `incr_reset` implementations so incremental-parser tests run
  (best-effort: buffer input and try to decode whole documents).

No changes to Json.java; `make` (all unit tests) still passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Catalogue the 46 remaining jcpan -t JSON failures (after PR #550) by
feature, outline a 5-phase implementation strategy of a hand-rolled
encoder/decoder in Json.java, and set up a tracking section so the
work can be resumed and PRs measured against a reproducible baseline.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… / fastjson2

The old `Json.java` was a shallow wrapper around `fastjson2` that did
not honour most of the `JSON::XS` options (utf8, ascii, latin1,
pretty, canonical, allow_blessed, convert_blessed, relaxed,
allow_barekey, ..., error messages, number formatting, incremental
parsing) and was the root cause of the bulk of `jcpan -t JSON`
failures.

Rather than rewrite a JSON::XS-compatible encoder/decoder in Java,
point the bundled `JSON` shim at the already-bundled pure-Perl
`JSON::PP` (which handles every option and matches the test suite's
expectations).  Passing count on the CPAN JSON 4.11 test suite jumps
from 22/68 to 47/68.

Changes:

- `src/main/perl/lib/JSON.pm` now `@ISA = ('JSON::PP')` and no longer
  calls `XSLoader::load('Json')`.  The CPAN dispatcher surface
  (`$JSON::Backend`, `backend()`, `is_xs`/`is_pp`, `-support_by_pp`,
  `-convert_blessed_universally`, `jsonToObj`/`objToJson`, `property`
  override for `max_size == 1` quirk, …) is preserved.
- `src/main/java/org/perlonjava/runtime/perlmodule/Json.java`
  deleted (nothing else referenced it).
- `com.alibaba.fastjson2` dependency dropped from `build.gradle`,
  `gradle/libs.versions.toml` and `pom.xml`; it was only pulled in
  for `Json.java`.
- Documentation updated:
    - `docs/reference/bundled-modules.md`,
    - `docs/reference/xs-compatibility.md`,
    - `docs/guides/using-cpan-modules.md`,
    - `docs/getting-started/installation.md`,
    - `dev/modules/README.md`,
    - `dev/modules/dynamic_loading.md`,
    - `dev/design/sbom.md`,
    - `dev/custom_bytecode/TESTING.md`,
    - `dev/presentations/.../blog-post{,-long}.md`
  now describe JSON as a pure-Perl backend and drop all fastjson2
  references.
- `jperl` / `jperl.bat` comments no longer mention fastjson2 as the
  reason for the JDK 23 `--sun-misc-unsafe-memory-access=allow`
  flag; the flag stays (ASM / ICU4J still need it).

`make` (all unit tests) passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…re word operators

Two related parser bugs that can silently break user programs using
Perl's "package literal" bareword syntax (a bareword ending in `::`,
which stringifies to the class name without the trailing `::`).

### Bug 1 — `bless $ref, Foo::Bar::` kept the `::`

`parseBless` in `OperatorParser.java` converted an `IdentifierNode`
class argument straight into a `StringNode`, bypassing the
`::`-stripping logic that `EmitLiteral.emitIdentifier` performs
elsewhere.  Result: `ref($obj)` returned `"Foo::Bar::"` instead of
`"Foo::Bar"`, breaking `->isa`, `ref eq`, method dispatch, and
round-tripping through blessed-ref aware code (tests using
`bless $x, Foo::;` idioms, `UNIVERSAL::isa`, etc.).

### Bug 2 — `Foo::Bar:: eq $x` failed with "Undefined subroutine"

`SubroutineParser.parseIndirectMethodCall` decides whether an
unqualified bareword should be treated as an indirect sub call.  Its
"is the next token an infix operator?" check looked only at
`LexerTokenType.OPERATOR`, but the lexer produces word operators
(`eq`, `ne`, `lt`, `gt`, `le`, `ge`, `cmp`, `x`, `isa`, `and`, `or`,
`xor`, …) as `IDENTIFIER` tokens.  A qualified bareword like
`Foo::Bar::` followed by `eq` was therefore mis-parsed as
`&Foo::Bar::(eq $x)`, dying at runtime with "Undefined subroutine
&Foo::Bar:: called".

### Fixes

- `OperatorParser.parseBless`: strip a trailing `::` from the bareword
  class name before wrapping it in a `StringNode`, matching
  `EmitLiteral.emitIdentifier`.
- `SubroutineParser`: broaden the `infixOp` predicate to accept both
  `OPERATOR` and `IDENTIFIER` token types, gated on
  `INFIX_OP.contains(text)`.  `INFIX_OP` already covers the relevant
  word operators, so this is a safe widening.

### Test plan

Extended `src/test/resources/unit/method_call_trailing_colons.t` from
5 to 10 assertions covering `bless Foo::Bar::`, `Foo::Bar:: eq $x`,
and the package-literal stringification (all pass).

Also picks up `t/52_object.t` from the CPAN `JSON` test suite
(which uses the `JSON::tojson::` idiom), taking `jcpan -t JSON`
from 47 → 48 passing.

`make` (all unit tests) still passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Real Perl treats a braced expression as an anonymous hash when the
first content token is a hash-key-shaped thing (string literal, number
literal, or a bareword that is not a block-introducing keyword) and a
plain comma appears at depth 1.  PerlOnJava's `StatementResolver.isHashLiteral`
only recognised the fat-comma (`=>`) and sigil-lead (`{ %h }`, `{ @A }`)
forms, so:

```perl
sub f { {"a","b"} }
print ref(f()), "\n";
```

printed the empty string (block evaluating a comma expression) instead
of `HASH`.  This silently corrupted user data returned from anonymous
subs/methods — notably `TO_JSON` in CPAN `JSON`'s t/12_blessed.t — and
could mask real hashref returns in any code that relied on this idiom.

### Fix

Extend `isHashLiteral`:

- Track whether the first content token is *key-like*: a `STRING` or
  `NUMBER` token, a quote-start operator (`"`, `'`, `` ` ``), or a
  bareword `IDENTIFIER` that is not a sigil variable, not `&` or `*`
  glob ref, and not a block-introducing keyword (`map`, `grep`,
  `sort`, `do`, `eval`, `return`, `my`, `our`, `if`, `unless`,
  `while`, …).
- Track whether a plain comma appears at depth 1.
- In the final decision, if both are true and no block indicator
  (`;` early-exits before we get here) was seen, return `true`.

The new branch sits alongside the existing `firstTokenIsSigil` rule;
all previously-recognised forms keep their behaviour.

### Tests

New `src/test/resources/unit/hash_ref_disambiguation.t`, 14 cases
covering:
- `{"a","b"}`, `{1,2,3,4}`, `{"foo",1,"bar",2}`, `{a=>1}`, `+{"a","b"}`,
  `{}` all return hashref with correct contents.
- `map { ... }`, `grep { ... }`, `sort { ... }`, `do { ... }`,
  `eval { ... }`, `do { "foo", "bar" }` still parse as blocks.
- `sub { return {...} }->()` still returns hashref.

### JSON test-suite impact

`jcpan -t JSON` passing count: 48 → 49 (t/12_blessed.t t5 now passes;
the `TO_JSON { {'__',''} }` idiom now round-trips through the hashref
constructor instead of silently stringifying to an empty list).

`make` (all unit tests) passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
`unpack "U*", "\xc2\xb6"` returned `(182)` (a single code point
produced by re-decoding the two bytes as UTF-8) instead of
`(194, 182)` — one code point per character of the input string, as
real Perl does in character mode.

This broke any code that uses `unpack "U*"` to enumerate the
characters of a Latin-1 byte string.  Most visibly, it was the root
cause of `JSON::PP`'s `ascii` and `latin1` encoders producing wrong
output for non-UTF-8-flagged strings (`_encode_ascii` calls
`unpack("U*", $s)`).

### Root cause

`UFormatHandler` branched on `state.isUTF8Data()` — a flag that is
true only when the string contains code points > 255.  When false,
the handler decoded UTF-8 from `originalBytes` (which for Latin-1
data is literally the string's code points encoded as ISO-8859-1
— i.e. the byte value of a char IS its code point).  That second
UTF-8 decode turned `\xc2\xb6` into the single code point 0xB6.

### Fix

Branch on `state.isCharacterMode()` instead: in character mode (the
default, and after a leading `U` or explicit `C0` in the template)
read code points directly via `state.nextCodePoint()`; in byte mode
(after `U0`) keep decoding UTF-8 from the byte buffer.  Mode, not
storage, is what determines the semantics Perl exposes.

### Tests

Added 4 regression cases to
`src/test/resources/unit/unpack.t` (subtest
"U format reads code points, does not UTF-8-decode"):

- `unpack "U*", "\xc2\xb6"` → (194, 182)
- `unpack "U*", "abc"` → (97, 98, 99)
- `unpack "U0U*", "\xc2\xb6"` → (182)   (byte mode / UTF-8 decode)
- `unpack "U*", "\x{8000}\x{20}"` → (0x8000, 0x20)

### JSON test-suite impact

`jcpan -t JSON` passing count: 49 → 50 (t/109_encode.t now passes
because JSON::PP's `_encode_ascii` sees both bytes of `"\xc2\xb6"`).

`make` (all unit tests) passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The regex preprocessor was unconditionally escaping `?` inside `[ ]`
(along with a handful of other characters that don't need escaping in
Java character classes).  When the preceding sequence was `\c`, the
emitted Java pattern became `\c\?`, which Java interprets as:

  - `\c\` — control-backslash, i.e. U+001C (FS)
  - `?` — literal question mark

instead of Perl's `\c?` = U+007F (DEL).  So patterns like
`[\n\t\c?[:^cntrl:]]` (used by `JSON::PP::_encode_ascii` and in
`JSON::PP::string_to_json` to scan for unescaped control characters)
silently matched the wrong code point.

Concretely, `JSON::PP` would then fail to escape `0x1C` on encode, and
the decoder — correctly — rejected the resulting JSON with
"invalid character 0x1C encountered while parsing JSON string".
This was the root cause of `t/99_binary.t` in the CPAN JSON test suite
dying after ~800 of its 24576 round-trip tests.

### Fix

Remove `?` from the list of characters the preprocessor backslash-
escapes inside `[ ]`.  Question mark is not special in a Java regex
character class, so the escape was decorative anyway — it was only
harmful when immediately following `\c`.

The other characters in the list (`(`, `)`, `*`, `<`, `>`, `'`, `"`,
`` ` ``, `@`, `#`, `=`, `&`) are left alone in this commit to keep the
change minimal; they can be cleaned up separately.

### Tests

New subtest in
`src/test/resources/unit/regex/regex_charclass.t`:

- `[\c?]` matches only U+007F (for every code point in 0x00..0x1F and
  0x7F).
- Literal `?` in a class still matches.
- `?` as a quantifier still works (`colour` vs `color` with `colou?r`).

### JSON test-suite impact

`jcpan -t JSON` passing count: 50 → 64 (t/99_binary.t alone accounts
for thousands of sub-tests, plus e01_property is unblocked by the new
JSON.pm `property()` method and most `JSON::PP->is_xs`/`is_pp`-using
tests are unblocked by adding those methods to JSON::PP).

`make` (all unit tests) passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The default `UNIVERSAL::TO_JSON` installed by
`use JSON -convert_blessed_universally` was checking
`ref $_[0]` against `"HASH"`/`"ARRAY"` to decide whether to
unwrap a blessed reference into a plain hashref/arrayref.
But on a BLESSED reference, `ref` returns the class name
(e.g. `"MyTest"`), not the underlying reftype.  As a result
every call fell through to the `undef` branch and the encoder
emitted `null` for every blessed object even when the user
had turned on `convert_blessed`.

Switched to `Scalar::Util::reftype`, which returns `"ARRAY"` /
`"HASH"` regardless of whether the ref is blessed — matching
what CPAN JSON.pm actually does.

`jcpan -t JSON` passing count: 64 → 65
(t/e11_conv_blessed_univ.t now passes).

`make` passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
After `*Dst:: = *Src::;` the two package names are aliases — they
point at the same underlying stash.  PerlOnJava already routed
sub dispatch, @isa lookup, and symbolic references through
`GlobalVariable.resolveStashAlias`, but `bless` and `UNIVERSAL::isa`
did not.  As a result:

    *Dummy::True:: = *JSON::PP::Boolean::;
    my $x = bless {}, "Dummy::True";
    ref($x);                       # was: "Dummy::True" (real Perl: "JSON::PP::Boolean")
    $x->isa("JSON::PP::Boolean");  # was: false        (real Perl: true)
    $x->isa("Dummy::True");        # was: true         (real Perl: true)

Any code that does package aliasing for boolean/enum/marker classes
(JSON::PP::Boolean, Types::Serialiser, role composition, …) got
incorrect answers from `ref` / `isa`.

### Fixes

1. `ReferenceOperators.bless` — canonicalise the class-name argument
   through `GlobalVariable.resolveStashAlias` before registering the
   bless id.  Matches Perl's semantics: `bless` binds the referent to
   the stash SV, whose `HvNAME` is the canonical package name.

2. `Universal.isa` — canonicalise the argument class through
   `resolveStashAlias` so that `$x->isa("aliased_name")` still
   succeeds when the canonical linearized hierarchy only contains
   the canonical name.

### Tests

Extended `src/test/resources/unit/stash_aliasing.t` with a new
subtest "bless through aliased package name" covering all four
symptoms: `ref()`, `->isa(canonical)`, `->isa(alias)`, and method
dispatch.

### JSON test-suite impact

`jcpan -t JSON` passing count: 65 → 66 (t/118_boolean_values.t now
passes because Dummy::True / Dummy::False — aliased to
JSON::PP::Boolean via `*Dummy::True:: = *JSON::PP::Boolean::;` —
now round-trip correctly through `encode` / `decode`).

`make` passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Record the full set of parser, regex, runtime, and shim fixes applied
to reach 66/68 passing on CPAN JSON's test suite, and document the two
remaining failures (JSON::backportPP-specific introspection and
`use bytes; substr` on multi-byte chars) as non-parser issues that
are tracked but out of scope for this PR.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…n jcpan -t JSON

The CPAN `JSON` dispatcher honours `$ENV{PERL_JSON_BACKEND}`, and the
value `JSON::backportPP` is special: it asks for a self-contained copy
of JSON::PP that must not leave `$INC{'JSON/PP.pm'}` populated.
Several tests in the CPAN `JSON` distribution set this env var and
then introspect `%INC` to confirm the contract, so without support
for it the tests either die at import-time or get the wrong backend
introspection answers.

### Additions

- **`src/main/perl/lib/JSON/backportPP.pm`** — new.  Defines an empty
  `JSON::backportPP` class (`@ISA = ('JSON::PP')`, `is_xs`/`is_pp`
  class methods), then loads `JSON/PP.pm`'s source via
  `open → read → eval` so JSON::PP's methods are populated without
  `require JSON::PP` being called (`%INC` therefore stays clean on
  `JSON/PP.pm`).  `do FILE` would be more natural but PerlOnJava's
  `do` doesn't yet honour the `jar:` @inc entries our JAR uses.

- **`src/main/perl/lib/JSON.pm`** — gate the `require` of
  `JSON::PP` vs `JSON::backportPP` on `$ENV{PERL_JSON_BACKEND}` in
  a BEGIN block.  Publish `$JSON::Backend` / `$JSON::BackendModule`
  / `$JSON::BackendModulePP` using the same value.

- **Compatibility with the CPAN `JSON::backportPP`** — when that
  file (shipped inside the CPAN JSON tarball, staged to
  `./blib/lib/JSON/backportPP.pm` by our MakeMaker) is loaded in
  preference to ours via `PERL5LIB=./blib/lib:./blib/arch`, it
  declares `package JSON::PP;` internally but only sets
  `@JSON::backportPP::ISA = ('Exporter')` — leaving
  `@JSON::PP::ISA` empty and `JSON::backportPP` without
  `isa('JSON::PP')` or `is_pp` / `is_xs`.  Our `JSON.pm` now
  paper-patches both omissions after loading so the dispatcher
  surface behaves the same regardless of which backportPP was
  picked up:

    unless (@JSON::PP::ISA) { @JSON::PP::ISA = ('Exporter') }
    unless (grep { $_ eq 'JSON::PP' } @JSON::backportPP::ISA) {
        push @JSON::backportPP::ISA, 'JSON::PP';
    }
    unless (defined &JSON::backportPP::is_pp) {
        *JSON::backportPP::is_pp = sub { 1 };
        *JSON::backportPP::is_xs = sub { 0 };
    }
    unless (defined &JSON::PP::is_pp) {
        *JSON::PP::is_pp = sub { 1 };
        *JSON::PP::is_xs = sub { 0 };
    }

### Result

`jcpan -t JSON` (CPAN JSON 4.11 test suite):

  **62 pass, 6 skip (plan skip_all — not real failures), 0 fail**

- `t/00_load_backport_pp.t` now passes: backportPP is loaded,
  `$INC{'JSON/PP.pm'}` is not set, `JSON->backend->isa('JSON::PP')`
  is true, `is_pp` is true.
- `t/117_numbers.t` and `t/rt_90071_incr_parse.t` now run
  correctly (they need `JSON::backportPP->is_pp` at BEGIN time).

Remaining 6 skips are `plan skip_all "requires ..."` — the test
files themselves decide they don't apply (e.g. "requires JSON::XS 4
compat backend").  They are not real failures.

### Verification

- `make` (all PerlOnJava unit tests) passes.
- `cd ~/.cpan/build/JSON-4.11-9 && PERL5LIB=./blib/lib:./blib/arch
  jperl t/*.t` prints 0 `not ok` lines across the whole suite.

### Progression

| Milestone | Passing |
|-----------|---------|
| master | 16 / 68 |
| + JSON.pm shim surface | 22 / 68 |
| + delegate to JSON::PP, drop fastjson2 | 47 / 68 |
| + 4 parser/runtime/regex fixes | 64 / 68 |
| + JSON::PP `is_xs`/`is_pp`, `property()` | 65 / 68 |
| + `reftype` in `-convert_blessed_universally` | 65 / 68 (+t/e11) |
| + stash-alias canonicalisation in bless/isa | 66 / 68 |
| **+ JSON::backportPP support + compat paper patching** | **62 / 68 pass, 6 skip, 0 fail** |

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…d of eval

The previous implementation of `JSON::backportPP.pm` read JSON::PP's
source and ran it through `eval STRING`.  Functionally that defined
all of JSON::PP's subroutines, but something about the eval context
caused the decoder's internal string-representation handling to
diverge — when utf8-input tests ran under the harness, the decoder
would report "malformed UTF-8 character" and warn
'Argument "\x{10FFFF}" isn\'t numeric in numeric gt (>)' inside
`is_valid_utf8`, aborting 10 / 68 test files with bad-plan errors.

Observed only when the test harness used our bundled
`JSON/backportPP.pm` (i.e. with the new MakeMaker staging-skip in
effect — when the CPAN tarball's own `blib/lib/JSON/backportPP.pm`
was still present, those tests passed because they went through
CPAN's self-contained code).

The simpler fix: `require JSON::PP` normally and then
`delete $INC{'JSON/PP.pm'}` so the backend-identification contract
stays satisfied.  JSON::PP's code is loaded exactly once from the
filesystem, its internal representation of strings is unchanged,
and `%INC` still reports only `JSON/backportPP.pm` for callers that
introspect which backend was loaded.

### Result

`./jcpan -t JSON`:

  Before this commit: 10 / 68 test files failing.
  After this commit:  1 / 68 test files failing.
                      6 / 26126 subtests failing (all in
                      t/119_incr_parse_utf8.t, the known
                      `use bytes; substr` multi-byte-char issue).

`make` passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Completes the `jcpan -t JSON` test-parity work: `make test` now passes
the full suite (68 files, 26126 subtests, `Result: PASS`).

### The bug

`JSON::PP::IncrParser::incr_parse` ends its parse-one-value loop with

    use bytes;
    $self->{incr_text} = substr( $self->{incr_text}, $offset || 0 );

The decoder internally advances `$at` by the UTF-8 byte length of each
multi-byte character (via `is_valid_utf8`), so the returned `$offset`
is in bytes — but `$self->{incr_text}` may be a character string.
Upstream Perl reconciles the two with `use bytes`, which makes
`substr` operate on the string's underlying UTF-8 byte representation.

PerlOnJava's `use bytes` pragma currently only redirects `length`,
`chr`, `ord`, `fc`, `lc`, `uc`, `lcfirst`, `ucfirst`; `substr` is
still character-based.  The net effect was that after parsing an
object containing a multi-byte character, `substr` chopped off too
many characters of the buffer, corrupting or losing the rest of the
stream.  Symptoms: `t/119_incr_parse_utf8.t` subtests 19-24 failing,
and any user code that fed `incr_parse` a char-string containing
multi-byte chars would mysteriously lose content.

### Fix

Replace the `use bytes; substr` line in our vendored `JSON::PP.pm`
with an explicit branch:

- when `get_utf8` is true, `$self->{incr_text}` is already a byte
  string (user hands us UTF-8 bytes), so plain `substr($text, $offset)`
  is correct;
- when `get_utf8` is false, we decoded to chars earlier in the loop,
  so encode-substr-decode — matching the decoder's byte bookkeeping
  without relying on `use bytes` to redirect substr.

### Result

`./jcpan -t JSON`:

  Files=68, Tests=26126, Result: PASS
  (was 1 test file / 6 subtests failing before this commit)

`make` (all PerlOnJava unit tests) passes.

### Follow-up (separate issue)

`use bytes` should eventually redirect `substr` / `unpack` / `index`
globally in PerlOnJava (following the pattern already established for
`length`/`chr`/`ord`).  Tracked under the runtime roadmap; not needed
for `jcpan -t JSON`.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Update json_test_parity.md to record the final result: 68 test files,
26126 subtests, 0 failures.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ot barewords

Commit 3cf3405 taught `StatementResolver.isHashLiteral` to classify
`{ KEY, VALUE, ... }` as an anonymous hash when the first content
token was a hash-key-shaped thing and a comma appeared at depth 1.
The original implementation accepted three kinds of key-shaped first
token:

  1. a string literal  (e.g. `{"a","b"}`)
  2. a number literal  (e.g. `{1,2}`)
  3. a bare identifier that isn't a block-introducing keyword
     (e.g. `{foo,1}`)

Case 3 is wrong in two ways:

- Real Perl treats `{foo,1}` as a BLOCK, not a hashref.  Tested
  upstream:

      $ perl -e 'sub f { {foo,1,bar,2} } print ref(f()), "\n"'
      (empty — block evaluating a comma list)

  The motivating case (CPAN JSON t/12_blessed.t idiom
  `TO_JSON { {'__',''} }`) is STRING-first, so the more aggressive
  rule was unnecessary.

- More importantly, our pre-parse scanner walks the raw token stream
  WITHOUT tracking string boundaries.  Any `{` or `}` inside a
  single-quoted string inside a function call leaks into its
  brace-depth count.  The combination

      {   # comment
          some_function('...\Qx\Q{0c!}\...', qr/.../, {}, "...");
      }

  (repeated many times in `perl5_t/t/re/pat.t`, notably around
  `[perl #133921]` with deliberately-misbalanced regex fragments in
  string arguments) fooled the scanner into (a) marking
  `some_function` as key-like, (b) losing track of depth, and (c)
  never reaching the `;` that would have set the block indicator —
  so the whole block got misclassified as a hash expression and
  parsing died at the first `;`.

  `re/pat.t` went from passing 1074 / 1298 subtests on master to
  ABORTING on commit 3cf3405 (bad plan: planned 1298, ran 0).

### Fix

Drop the bareword branch of the key-like classifier.  Only string
literals (`STRING`, `NUMBER`, or the opening `"`, `'`, `` ` ``
quote-operator) count as hash-key first tokens.  All the original
motivating cases from commit 3cf3405 still pass (strings, numbers,
fat-comma, explicit `+{}`, empty `{}`), and the JSON test suite
continues to report `Result: PASS`.

Also removes the now-unused `isBlockIntroducingKeyword` helper.

### Tests

Updated `src/test/resources/unit/hash_ref_disambiguation.t`: the
`{foo,1,bar,2}` assertion is replaced with `{"foo",1,"bar",2}` to
match the real-Perl semantics now enforced.  All 14 assertions pass.

### Measured impact (via perl dev/tools/perl_test_runner.pl)

- `re/pat.t`: 0 / 0 (aborted) → 1074 / 1298 — restored to master baseline.
- `re/pat_advanced.t`, `op/pack.t`, `op/utf8decode.t`,
  `comp/parser.t`, `porting/manifest.t`, `porting/filenames.t`:
  all back to master baseline.
- `./jcpan -t JSON`: `Result: PASS` (unchanged — 68 files, 26126
  subtests).
- `make` (all PerlOnJava unit tests): passes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
My earlier `unpack "U"` rewrite (commit eb9b5aa, "U* reads code
points in character mode, not UTF-8 bytes") reduced `op/utf8decode.t`
from 603 → 527 passing and introduced inverted behaviour for templates
that rely on the `C0` / `U0` mode-switch convention.  This commit
replaces it with a version that matches upstream Perl in every mode
combination the CPAN test suite exercises.

### Real-Perl semantics (verified against upstream `perl`)

  `unpack "U*",     "\xc2\xb6"`   →  (194, 182)   — read code points
  `unpack "U*",     "abc"`         →  (97, 98, 99) — read code points
  `unpack "C0U*",   "\xce\xba"`    →  (954)        — decode UTF-8
  `unpack "U0U*",   "\xce\xba"`    →  (206, 186)   — read code points
  `unpack "U0U C0 W", "\xf8\xf9…"` →  (248, 249)   — mixed

### Rule

`U` reads a code point directly when:

  - the template begins with a bare `U` (e.g. `"U*"`), or
  - the state has been switched to `byteMode` via `U0`.

Otherwise (default, or after `C0`) it decodes UTF-8 from the byte
buffer.

### State bookkeeping

`UnpackState`'s internal `characterMode` flag has the opposite sense
of what the Perl docs call "character mode" — it is calibrated for
the other numerous format handlers (string, numeric, dot) that cannot
be audited-and-swapped here without risking wider regressions.  So
the `C0` / `U0` inline handlers in `Unpack.java` are left as they
were; only `UFormatHandler` decides when to read code points vs. when
to decode UTF-8.

When reading code points in Perl-level "character mode" while
UnpackState is internally in byte mode (the `U0…U` case), the handler
now also advances `buffer.position` in step with `codePointIndex`.
Without that, a subsequent `switchToCharacterMode` (triggered by
e.g. `C0`) recomputes `codePointIndex` from the byte offset and rewinds
to before the `U` read, causing the following format to reread the
same character.  That was the mechanism behind the `U0U C0 W`
regression in `op/pack.t`.

### Measured impact (`perl dev/tools/perl_test_runner.pl`)

- `op/utf8decode.t` 603 / 713 — restored to master baseline
  (`C0U*` path now decodes UTF-8 correctly)
- `op/pack.t` 14672 / 14726 — restored to master baseline
  (`U0U C0 W` round-trip now advances both cursors)
- `re/pat.t`, `re/pat_advanced.t`, `comp/parser.t`, `comp/term.t`,
  `porting/manifest.t`, `porting/filenames.t`,
  `benchmark/gh7094-…` — all back to master baseline
- `./jcpan -t JSON`: `Result: PASS` (unchanged —
  `JSON::PP::_encode_ascii` `unpack("U*", $str)` path still correct)
- `make` (all PerlOnJava unit tests): passes

### Tests

Extended `src/test/resources/unit/unpack.t` regression subtest from 4
to 6 cases covering every mode combination above plus the `U0U C0 W`
mixed sequence that originally motivated this commit.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the feature/json-shim-compat branch from 83d9304 to 5a11e5e Compare April 23, 2026 17:24
Regenerate patches/PP.pm.patch so a fresh `perl dev/import-perl5/sync.pl`
preserves the PerlOnJava modifications to JSON::PP.pm:

  - `is_xs` / `is_pp` class methods (so tests which introspect
    `JSON::PP->is_pp` directly work out of the box)
  - IncrParser `substr` workaround (upstream uses `use bytes; substr`,
    but PerlOnJava's `use bytes` doesn't redirect substr yet, so we
    explicitly encode/substr/decode to match the decoder's byte offset)

Without this, running sync.pl would silently overwrite those fixes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant