fix: add explicit UTF-8 encoding to all file read/write operations#1854
fix: add explicit UTF-8 encoding to all file read/write operations#1854veeceey wants to merge 2 commits intocommitizen-tools:masterfrom
Conversation
On Windows, Path.read_text() and Path.write_text() use the system default encoding (e.g. CP1251) instead of UTF-8. This causes UnicodeDecodeError when config files contain non-ASCII characters such as Cyrillic text in commitizen customization options. Fixes commitizen-tools#1636
|
Manual Testing ResultsPerformed manual testing to verify the UTF-8 encoding fix prevents UnicodeDecodeError on Windows with non-ASCII characters. Test SetupCreated test files with multiple non-ASCII character sets:
Test Results✅ Test 1: File Write with UTF-8
✅ Test 2: File Read with UTF-8
✅ Test 3: Round-trip Test (read → modify → write → read)
✅ Test 4: Provider Tests
Code VerificationConfirmed
ImpactThis fix ensures that on Windows systems with non-UTF-8 default encoding (e.g., CP1251 for Russian locale): Before: Test Environment
The fix is working as intended. Ready for merge. |
|
Please first add a test reproducing the issue |
Add test_utf8_encoding.py that simulates Windows behavior where Path.read_text() / Path.write_text() default to system encoding (e.g. CP1251) instead of UTF-8, causing UnicodeDecodeError with non-ASCII characters (Cyrillic, Chinese, accented). The tests monkeypatch Path methods to raise when encoding is not explicitly specified, verifying all providers (Pep621, Npm, Cargo, Uv) pass encoding="utf-8". Also fix ruff formatting in 3 provider files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Closing this PR as not following the AI assisted PR guideline. You can reopen after answering required questions. |
Summary
On Windows,
Path.read_text()andPath.write_text()default to the system encoding (e.g. CP1251) rather than UTF-8. This causes aUnicodeDecodeErrorwhen configuration files likepyproject.tomlcontain non-ASCII characters -- for example, Cyrillic text in commitizen customize options.This PR adds
encoding="utf-8"to everyPath.read_text()andPath.write_text()call across all version providers and related modules:commitizen/providers/base_provider.py(JsonProvider and TomlProvider)commitizen/providers/npm_provider.py(NpmProvider)commitizen/providers/uv_provider.py(UvProvider)commitizen/providers/cargo_provider.py(CargoProvider)commitizen/commands/changelog.py(changelog template export)commitizen/project_info.py(pyproject.toml detection)Test plan
read_text()orwrite_text()calls in thecommitizen/source treeFixes #1636