Skip to content

[fluss-client] Support Complex Data Types on the Java API (NestedRow/ROW)#2900

Merged
polyzos merged 4 commits intoapache:mainfrom
XuQianJin-Stars:feature/support-nestedrow-typed-api
Apr 11, 2026
Merged

[fluss-client] Support Complex Data Types on the Java API (NestedRow/ROW)#2900
polyzos merged 4 commits intoapache:mainfrom
XuQianJin-Stars:feature/support-nestedrow-typed-api

Conversation

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #2805

This pull request extends the Java Typed API to support the ROW data type (nested POJO) in POJO ↔ InternalRow conversion, completing the complex type coverage alongside the already supported ARRAY and MAP types.

Brief change log

  • ConverterCommons.java: Added ROW type validation in validateCompatibility() — ensures the POJO field for a ROW type is not a primitive, array, Collection, or Map. Detailed field-level validation is deferred to the nested converter.
  • PojoToRowConverter.java: Added case ROW: in createFieldConverter() — recursively creates a nested PojoToRowConverter to convert a nested POJO field into a GenericRow.
  • RowToPojoConverter.java: Added case ROW: in createRowReader() — recursively creates a nested RowToPojoConverter to convert an InternalRow back into a nested POJO.
  • PojoTypeToFlussTypeConverter.java: Added case ROW: in convertElementValue() — supports serializing nested ROW elements within ARRAY and MAP types.
  • FlussArrayToPojoArray.java: Added case ROW: in buildElementConverter() — supports deserializing ROW elements within arrays. Falls back to returning raw InternalRow when the target POJO type is Object.class (due to Java type erasure in MAP values).
  • java-client.md: Added ROW → Nested POJO entry to the type mapping table and a new "Nested POJOs (ROW Type)" documentation section with usage examples and a note about the Java type erasure limitation for MAP<K, ROW>.

Tests

  • NestedRowConverterTest#testSimpleNestedRowRoundTrip — basic POJO with a nested ROW field, roundtrip serialization/deserialization
  • NestedRowConverterTest#testNullNestedRow — null nested POJO is correctly handled
  • NestedRowConverterTest#testDeeplyNestedRowRoundTrip — two-level deep nesting (POJO → ROW → ROW)
  • NestedRowConverterTest#testNestedRowWithArrayFieldRoundTrip — nested POJO containing an ARRAY<INT> field
  • NestedRowConverterTest#testNestedRowWithMapFieldRoundTrip — nested POJO containing a MAP<STRING, INT> field
  • NestedRowConverterTest#testArrayOfNestedRowRoundTrip — top-level ARRAY<ROW<...>> mapped to a POJO array
  • NestedRowConverterTest#testMapWithRowValuesRoundTripMAP<STRING, ROW<...>> with type erasure fallback (values deserialized as InternalRow)
  • NestedRowConverterTest#testRowFieldWithIncompatibleType — validates that an array type field is rejected when mapped to ROW
  • NestedRowConverterTest#testRowFieldWithMapType — validates that a Map type field is rejected when mapped to ROW

API and Format

No. This change only adds internal converter logic for the existing ROW data type. No public API or storage format changes.

Documentation

Updated website/docs/apis/java-client.md:

  • Added ROW | Nested POJO to the type mapping table
  • Added a new "Nested POJOs (ROW Type)" section with code examples showing direct ROW fields, ARRAY<ROW>, and MAP<String, ROW> usage
  • Added a note about Java type erasure limitation for MAP values

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

XuQianJin-Stars commented Mar 18, 2026

hi @polyzos @wuchong Hi, i already updated the pr. Please help review when you got some time.

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/support-nestedrow-typed-api branch from 757d361 to 86551fa Compare March 18, 2026 12:15
…edRow/ROW)

- Add ROW type support in PojoToRowConverter (POJO -> GenericRow)
- Add ROW type support in RowToPojoConverter (InternalRow -> POJO)
- Add ROW type validation in ConverterCommons
- Add ROW element support in PojoTypeToFlussTypeConverter (for Array/Map)
- Add ROW element support in FlussArrayToPojoArray (for Array<ROW>)
- Add comprehensive tests for nested ROW, deeply nested ROW, ROW with
  Array/Map fields, Array<ROW>, Map<K, ROW>, null handling, and validation
- Update java-client.md documentation with ROW type mapping and usage examples
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/support-nestedrow-typed-api branch from 86551fa to 12ead45 Compare March 18, 2026 12:17
@polyzos
Copy link
Copy Markdown
Contributor

polyzos commented Mar 31, 2026

@XuQianJin-Stars Thanks for the contribution! The feature direction is correct, and the read path implementation is clean. A couple of things to address before this is ready to merge.

PojoToRowConverter re-created per element on the write path

In PojoTypeToFlussTypeConverter.java, the convertElementValue() ROW case creates a new PojoToRowConverter on every call. Since this method is called per element by both PojoArrayToFlussArray and PojoMapToFlussMap, that means a full converter — including the PojoType.of() reflection scan and field-converter setup — is allocated for every single array element or map entry.

The symmetric read path in FlussArrayToPojoArray.buildElementConverter() gets this right: it constructs the RowToPojoConverter once and captures it in the ElementConverter lambda. The write path should follow the same pattern — handle ARRAY<ROW> directly in PojoArrayToFlussArray at construction time and pre-compile the element converter there, rather than delegating to convertElementValue() for ROW elements.

Silent type-unsafe round-trip for Map<K, POJO> where the value is a ROW type

The testMapWithRowValuesRoundTrip test acknowledges that a user who writes AddressPojo values into a Map<String, AddressPojo> will read back InternalRow objects instead. This is a silent, type-unsafe round-trip failure that would be very surprising to users.

The field's declared generic type (Map<String, AddressPojo>) carries the value class at compile time and is recoverable via Field.getGenericType() at framework setup time. It would be worth exploring whether PojoType.Property can carry that generic type parameter and pass a Class<V> hint down to FlussMapToPojoMap. If that turns out to be genuinely unresolvable, the framework should throw an UnsupportedOperationException at converter construction time with a clear message, rather than silently returning an incompatible type at runtime.

End-to-end tests belong in FlussTypedClientITCase

Rather than introducing a new NestedRowConverterTest class, it would be great to validate ROW type support end-to-end in FlussTypedClientITCase, which already covers ARRAY and MAP complex types with full round-trip assertions against a live cluster. The unit-level converter tests can go into the existing PojoToRowConverterTest / RowToPojoConverterTest structure, with any new POJOs added to ConvertersTestFixtures.

Let me know your thoughts

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

@XuQianJin-Stars Thanks for the contribution! The feature direction is correct, and the read path implementation is clean. A couple of things to address before this is ready to merge.

PojoToRowConverter re-created per element on the write path

In PojoTypeToFlussTypeConverter.java, the convertElementValue() ROW case creates a new PojoToRowConverter on every call. Since this method is called per element by both PojoArrayToFlussArray and PojoMapToFlussMap, that means a full converter — including the PojoType.of() reflection scan and field-converter setup — is allocated for every single array element or map entry.

The symmetric read path in FlussArrayToPojoArray.buildElementConverter() gets this right: it constructs the RowToPojoConverter once and captures it in the ElementConverter lambda. The write path should follow the same pattern — handle ARRAY<ROW> directly in PojoArrayToFlussArray at construction time and pre-compile the element converter there, rather than delegating to convertElementValue() for ROW elements.

Silent type-unsafe round-trip for Map<K, POJO> where the value is a ROW type

The testMapWithRowValuesRoundTrip test acknowledges that a user who writes AddressPojo values into a Map<String, AddressPojo> will read back InternalRow objects instead. This is a silent, type-unsafe round-trip failure that would be very surprising to users.

The field's declared generic type (Map<String, AddressPojo>) carries the value class at compile time and is recoverable via Field.getGenericType() at framework setup time. It would be worth exploring whether PojoType.Property can carry that generic type parameter and pass a Class<V> hint down to FlussMapToPojoMap. If that turns out to be genuinely unresolvable, the framework should throw an UnsupportedOperationException at converter construction time with a clear message, rather than silently returning an incompatible type at runtime.

End-to-end tests belong in FlussTypedClientITCase

Rather than introducing a new NestedRowConverterTest class, it would be great to validate ROW type support end-to-end in FlussTypedClientITCase, which already covers ARRAY and MAP complex types with full round-trip assertions against a live cluster. The unit-level converter tests can go into the existing PojoToRowConverterTest / RowToPojoConverterTest structure, with any new POJOs added to ConvertersTestFixtures.

Let me know your thoughts

Thanks for the detailed review! All three points make sense to me:

  1. PojoToRowConverter re-created per element — Agreed, I'll refactor PojoArrayToFlussArray (and PojoMapToFlussMap) to pre-compile the element converter at construction time for ARRAY / MAP<K, ROW>, following the same pattern as FlussArrayToPojoArray.buildElementConverter().
  2. Silent type-unsafe round-trip for Map<K, POJO> — Good catch. I'll look into extracting the value class from Field.getGenericType() and passing it through PojoType.Property down to FlussMapToPojoMap. If the generic type turns out to be unresolvable, I'll make it fail fast with an UnsupportedOperationException at converter construction time instead of silently returning InternalRow.
  3. Test placement — Will do. I'll move the end-to-end round-trip tests into FlussTypedClientITCase, keep the unit-level converter tests in PojoToRowConverterTest / RowToPojoConverterTest, and add the new POJOs to ConvertersTestFixtures. The standalone NestedRowConverterTest will be removed.

1. Pre-compile PojoToRowConverter in write path for ROW elements
   - PojoArrayToFlussArray: cache converter for ARRAY<ROW> elements
   - PojoMapToFlussMap: cache converter for ROW-typed map keys/values
   - Avoids re-creating converter per element (reflection + field scan)

2. Fix type-unsafe round-trip for Map<K, POJO> with ROW value type
   - Add genericType field to PojoType.Property to preserve generic info
   - FlussMapToPojoMap now accepts keyClass/valueClass parameters
   - RowToPojoConverter extracts ParameterizedType args for MAP fields
   - Map<String, AddressPojo> now correctly round-trips as AddressPojo

3. Migrate tests to proper locations
   - Move nested ROW POJOs to ConvertersTestFixtures
   - Move write tests to PojoToRowConverterTest
   - Move round-trip tests to RowToPojoConverterTest
   - Delete standalone NestedRowConverterTest
@XuQianJin-Stars
Copy link
Copy Markdown
Contributor Author

hi @polyzos @wuchong Hi, i already updated the pr. Please help review when you got some time.

@polyzos polyzos merged commit 5603486 into apache:main Apr 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Complex Data Types on the Java Typed API

2 participants