Skip to content

Allow writing StringDType variables to netCDF#11218

Open
kkollsga wants to merge 2 commits intopydata:mainfrom
kkollsga:fix-stringdtype-netcdf-11199
Open

Allow writing StringDType variables to netCDF#11218
kkollsga wants to merge 2 commits intopydata:mainfrom
kkollsga:fix-stringdtype-netcdf-11199

Conversation

@kkollsga
Copy link
Contributor

@kkollsga kkollsga commented Mar 8, 2026

Summary

  • Recognizes numpy.dtypes.StringDType (kind "T") as a unicode string type in is_unicode_dtype, so the encoding pipeline and backend dtype selection handle it correctly.
  • Converts StringDType arrays to object arrays in netCDF4 and h5netcdf backend prepare_variable methods, since neither C library supports StringDType natively.
  • Null values from StringDType(na_object=None) are replaced with empty strings on write, matching existing behavior for object-dtype string arrays with missing values.
  • The scipy backend already works because EncodedStringCoder(allows_unicode=False) encodes strings to bytes via encode_string_array, which handles StringDType.

Test plan

  • test_is_unicode_dtype_stringdtype — unit test for is_unicode_dtype with StringDType
  • test_roundtrip_stringdtype_data — roundtrip test in DatasetIOBase, runs across all backends (netCDF4, h5netcdf, scipy, zarr)
  • Manual verification of null handling with StringDType(na_object=None)
  • Pre-commit (ruff, formatting) passes
  • mypy passes (no new errors)

🤖 Generated with Claude Code

Recognize numpy.dtypes.StringDType (kind "T") as a unicode string type
in is_unicode_dtype, and convert StringDType arrays to object arrays
before passing to netCDF4/h5netcdf backends which don't support
StringDType natively. Null values from StringDType(na_object=None) are
replaced with empty strings on write.

Co-authored-by: Claude <noreply@anthropic.com>
@kkollsga
Copy link
Contributor Author

kkollsga commented Mar 9, 2026

The mypy and test failures look related to numpy 2.4.2's stricter type stubs (#11183, #11204).

@kkollsga kkollsga mentioned this pull request Mar 9, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datasets concatenated along string dimension cannot write to netCDF

1 participant