fix: add explicit UTF-8 encoding to all file read/write operations by veeceey · Pull Request #1854 · commitizen-tools/commitizen

veeceey · 2026-02-08T15:02:05Z

Summary

On Windows, Path.read_text() and Path.write_text() default to the system encoding (e.g. CP1251) rather than UTF-8. This causes a UnicodeDecodeError when configuration files like pyproject.toml contain non-ASCII characters -- for example, Cyrillic text in commitizen customize options.

This PR adds encoding="utf-8" to every Path.read_text() and Path.write_text() call across all version providers and related modules:

commitizen/providers/base_provider.py (JsonProvider and TomlProvider)
commitizen/providers/npm_provider.py (NpmProvider)
commitizen/providers/uv_provider.py (UvProvider)
commitizen/providers/cargo_provider.py (CargoProvider)
commitizen/commands/changelog.py (changelog template export)
commitizen/project_info.py (pyproject.toml detection)

Test plan

Ran all provider tests (37 passed)
Verified no remaining bare read_text() or write_text() calls in the commitizen/ source tree

Fixes #1636

On Windows, Path.read_text() and Path.write_text() use the system default encoding (e.g. CP1251) instead of UTF-8. This causes UnicodeDecodeError when config files contain non-ASCII characters such as Cyrillic text in commitizen customization options. Fixes commitizen-tools#1636

codecov · 2026-02-08T15:02:52Z

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

veeceey · 2026-02-08T15:12:48Z

Manual Testing Results

Performed manual testing to verify the UTF-8 encoding fix prevents UnicodeDecodeError on Windows with non-ASCII characters.

Test Setup

Created test files with multiple non-ASCII character sets:

Cyrillic (Russian): Тестовый комментарий
Chinese: 测试注释
Korean: 테스트 주석
Emoji: 🚀 ✅ 🎉

Test Results

✅ Test 1: File Write with UTF-8

Successfully wrote pyproject.toml with all non-ASCII characters
File size: 396 bytes

✅ Test 2: File Read with UTF-8

Successfully read file with encoding="utf-8" parameter
All character sets preserved correctly:
- ✓ Cyrillic characters preserved
- ✓ Chinese characters preserved
- ✓ Korean characters preserved
- ✓ Emoji characters preserved

✅ Test 3: Round-trip Test (read → modify → write → read)

Modified version from 1.0.0 → 2.0.0
All non-ASCII characters survived the round-trip
No data corruption or encoding errors

✅ Test 4: Provider Tests

Ran pytest tests/providers/
19 provider tests passed (18 SCM provider tests have unrelated fixture issues)

Code Verification

Confirmed encoding="utf-8" parameter is present in all file operations:

✓ commitizen/providers/base_provider.py (JsonProvider and TomlProvider)
✓ commitizen/providers/npm_provider.py
✓ commitizen/providers/uv_provider.py
✓ commitizen/providers/cargo_provider.py
✓ commitizen/commands/changelog.py
✓ commitizen/project_info.py

Impact

This fix ensures that on Windows systems with non-UTF-8 default encoding (e.g., CP1251 for Russian locale):

Before: UnicodeDecodeError: 'charmap' codec can't decode byte 0x98
After: Files with Cyrillic/Chinese/Korean/emoji characters work correctly

Test Environment

Platform: macOS (Darwin 25.2.0)
Python: 3.14.2
Note: While testing on macOS (which defaults to UTF-8), the explicit encoding="utf-8" parameter ensures consistent behavior across all platforms, including Windows with CP1251/CP1252 encodings.

The fix is working as intended. Ready for merge.

woile · 2026-02-08T16:59:52Z

Please first add a test reproducing the issue

Add test_utf8_encoding.py that simulates Windows behavior where Path.read_text() / Path.write_text() default to system encoding (e.g. CP1251) instead of UTF-8, causing UnicodeDecodeError with non-ASCII characters (Cyrillic, Chinese, accented). The tests monkeypatch Path methods to raise when encoding is not explicitly specified, verifying all providers (Pep621, Npm, Cargo, Uv) pass encoding="utf-8". Also fix ruff formatting in 3 provider files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bearomorphism · 2026-02-09T02:06:03Z

Closing this PR as not following the AI assisted PR guideline. You can reopen after answering required questions.

veeceey requested review from Lee-W, noirbizarre and woile as code owners February 8, 2026 15:02

github-actions bot added pr-status: wait-for-review type: bug labels Feb 8, 2026

bearomorphism closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add explicit UTF-8 encoding to all file read/write operations#1854

fix: add explicit UTF-8 encoding to all file read/write operations#1854
veeceey wants to merge 2 commits intocommitizen-tools:masterfrom
veeceey:fix/issue-1636-utf8-encoding

veeceey commented Feb 8, 2026

Uh oh!

codecov bot commented Feb 8, 2026

Uh oh!

veeceey commented Feb 8, 2026

Uh oh!

woile commented Feb 8, 2026

Uh oh!

bearomorphism commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

veeceey commented Feb 8, 2026

Summary

Test plan

Uh oh!

codecov bot commented Feb 8, 2026

⚠️ JUnit XML file not found

Uh oh!

veeceey commented Feb 8, 2026

Manual Testing Results

Test Setup

Test Results

Code Verification

Impact

Test Environment

Uh oh!

woile commented Feb 8, 2026

Uh oh!

bearomorphism commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants