Skip to content

Conversation

@byroot
Copy link
Member

@byroot byroot commented Jan 19, 2026

While trying to speedup various File.* methods, I realized they were way slower and complicated than they should for no apparent reason. However after asking Nobu he explained that Shift JIS encoded text can contain 0x5C (ASCII backslash) as the second byte of a two byte character sequence.

Since on Windows 0x5C is File::ALT_SEPARATOR, this can easily break naive path related algorithms searching for directory separators.

cc @eregon what do you think of that spec? I think it would have helped me figure things out way sooner, so would have been valuable. If you agree I can generalize it to more "path handling" methods.

@byroot byroot force-pushed the shiftjs-windows-path-operations branch 2 times, most recently from e8c8ec8 to 7cfe29b Compare January 19, 2026 05:53
@byroot byroot force-pushed the shiftjs-windows-path-operations branch from 7cfe29b to 38a2dbf Compare January 19, 2026 12:48
@byroot byroot changed the title File.dirname: add a spec for Shift-JS handling File.dirname: add a spec for Shift JIS handling Jan 19, 2026
@byroot byroot requested a review from trinistr January 19, 2026 12:49
While trying to speedup various `File.*` methods, I realized they
were way slower and complicated than they should for no apparent
reason. However after asking Nobu he explained that Shift JIS encoded
text can contain `0x5C` (ASCII backslash) as the second byte of a
two byte character sequence.

Since on Windows `0x5C` is `File::ALT_SEPARATOR`, this can easily
break naive path related algorithms searching for directory separators.
@byroot byroot force-pushed the shiftjs-windows-path-operations branch from 38a2dbf to 5c63521 Compare January 19, 2026 16:14
@byroot byroot requested a review from eregon January 19, 2026 16:15
@eregon eregon merged commit 03ef04f into ruby:master Jan 20, 2026
16 checks passed
@eregon
Copy link
Member

eregon commented Jan 20, 2026

Thanks! Nasty edge case indeed 😅

@byroot
Copy link
Member Author

byroot commented Jan 20, 2026

Yep. I can add it to more File.* methods if you'd like.

@byroot byroot deleted the shiftjs-windows-path-operations branch January 20, 2026 12:33
@eregon
Copy link
Member

eregon commented Jan 20, 2026

Yep. I can add it to more File.* methods if you'd like.

Yeah that'd be great for File methods which "split" paths.
Potentially could reduce to just one example per method on Windows, since that's the tricky one, on non-Windows I think ShiftJIS isn't any special since it's split on / and not on \.

The extra ASCII-compatible/or-not examples are good to have too.

@byroot
Copy link
Member Author

byroot commented Jan 20, 2026

Yeah that'd be great for File methods which "split" paths.

Ok, I'll add some as I go when I optimize these methods (like: ruby/ruby#15898, ruby/ruby#15902, ruby/ruby#15912)

Potentially could reduce to just one example per method on Windows, since that's the tricky one, on non-Windows I think ShiftJIS isn't any special since it's split on / and not on \.

Indeed. But since the same spec should pass on both I figured it was simpler not to restrict it.

@eregon
Copy link
Member

eregon commented Jan 20, 2026

Ok, I'll add some as I go when I optimize these methods (like: ruby/ruby#15898, ruby/ruby#15902, ruby/ruby#15912)

Thanks. Feel free to just add them directly as part of the ruby/ruby PRs for convenience. I think they don't need extra reviews given it would be similar to this PR.

Comment on lines +81 to +88
it "rejects strings encoded with non ASCII-compatible encodings" do
Encoding.list.reject(&:ascii_compatible?).reject(&:dummy?).each do |enc|
path = "/foo/bar".encode(enc)
-> {
File.dirname(path)
}.should raise_error(Encoding::CompatibilityError)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had this test fail in CI on macOS with Ruby 3.2.9:

File.dirname rejects strings encoded with non ASCII-compatible encodings ERROR
Encoding::ConverterNotFoundError: code converter not found (UTF-8 to RS3UTF-16-BE)
/Users/runner/work/spec/spec/core/file/dirname_spec.rb:83:in `encode'

What even is this encoding? It's not listed on my system and clearly wasn't a problem when this PR was made 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this makes no sense, that branch does not include this test at all!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to handle that too with File.dirname: https://github.com/ruby/ruby/pull/15919/changes#diff-f9288a658b2b8283fa9576185f8e17b99ade402e97f2ca60b368188234f917f1R167

I suspect we should do the same here, but maybe @eregon has a different opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed #1334 as a simple workaround by not running these specs on Ruby 3.2. The error is not related to the spec, and Ruby 3.2 is nearing EOL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now on master.
From #1334 (comment)

I had a similar issue when adding more encoding-related specs in TruffleRuby (not sync'd yet) [...]

Maybe we should drop 3.2 now?
I'm happy I managed to convince CRuby to drop replicate encodings, these replicates clearly caused extra edge cases like this one for little value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop Ruby 3.2 on the next sync from ruby/spec. It's also annoying to have 4 versions supported due to slower CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand how this test failed on a branch that doesn't have the test (here). This implies that wrong code is running, somehow.

Copy link
Member

@eregon eregon Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just because CI runs on a merge commit between your branch and master:
This run ran on 1c21abb

And this issue is transient because of spec ordering and we're running specs in parallel in CI so the ordering is not always the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, didn't realise it could do that. Thank you for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants