Define most of Pathname in Ruby (redo) #57

eregon · 2025-07-16T07:43:11Z

ruby-lang tracker issue: https://bugs.ruby-lang.org/issues/21532

Same as #53, but that was reverted in 593f030.
Should be reviewed commit-by-commit, that makes it much clearer which parts of the code are new, and which are from the original pathname.rb before translation to C began.

I cherry-picked the commits to make it easier to review.

Description from the original PR, reordered to have the most important first:

Once upon a time, Pathname was pure-Ruby: https://github.com/ruby/ruby/blob/95bc02237635d3fe42532bfe53038257575cee75/lib/pathname.rb

This PR goes back to that, and reuses that original Ruby code, but keeps the C extension implementation of <=> and sub as those are significantly faster.
The other Pathname methods are actually faster in Ruby than in C, because all these methods just do rb_funcall() and rb_ivar_get() and those in C code have no inline cache, but the corresponding method calls and @path have inline caches in Ruby code.
https://railsatscale.com/2023-08-29-ruby-outperforms-c/ is an explanation of that.

I have discussed this with @akr several times (notably in https://bugs.ruby-lang.org/issues/17473) and the last time he said it was OK to do this change.
The main goals are:

Simplify the implementation, e.g. the Ruby version is 3 times smaller in terms of lines and is much easier to read and maintain.
Share more of the Pathname implementation between Ruby implementations. With that other Ruby implementations can then easily be added in CI. Currently the pathname gem does not work on JRuby (no C ext support) and on TruffleRuby (some Ruby C API functions that this gem uses are not supported), this will be a huge help towards supporting both.

I worked hard to make the diff really clean, it only adds lines in lib/pathname.rb and only removes lines in ext/pathname/pathname.c. That way it should be easy to review it.
I restored the Ruby implementation of the methods from ed9270a, the commit just before methods started being migrated to the C extension.
I then fixed things to make the test suite pass and implemented the few missing methods based on their C definition.
The individual commits and their messages make it clear what exactly happened, so I would recommend to review commit-by-commit.

From my discussions with @akr, IIRC, the original motivation to rewrite pathname.rb to C, besides the optimization for <=>, was apparently to use *at functions like openat (see man openat, Rationale for openat() and other directory file descriptor APIs) but these are not portable, it did not happen, and is only useful in very rare edge cases.
The Ruby Dir class could potentially support some of that, but it seems it has never been important enough for someone to implement it.
The API of Pathname would anyway also need to change to take advantage of a working directory different than the process CWD, e.g. Pathname methods would need to take an extra "Pathname to use as working directory" argument.
(because if one just uses Pathname("relative/path").open(...) there is no point to use *at() functions).

It's significantly faster with this PR:

Speedup (this branch / master)	ruby 3.4.2	ruby 3.4.2 + YJIT
`Pathname.new(".")`	1.02x	1.19x
`Pathname#directory?`	1.03x	1.06x
`Pathname#to_s`	1.85x	2.38x

Structure:
benchmark name
command line
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
this branch
master
command line
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
this branch
master
command line
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
this branch
master

Pathname.new(".")
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'Benchmark.ips { it.report { Pathname.new(".") } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
1.718M (± 1.0%) i/s  (582.24 ns/i) -      8.629M in   5.024793s
1.680M (± 1.4%) i/s  (595.12 ns/i) -      8.457M in   5.033713s
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'Benchmark.ips { it.report { Pathname.new(".") } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
2.093M (± 0.9%) i/s  (477.76 ns/i) -     10.622M in   5.075014s
1.762M (± 0.6%) i/s  (567.54 ns/i) -      8.858M in   5.027444s
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'Benchmark.ips { it.report { Pathname.new(".") } }'
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
 32.078Q (±15.6%) i/s    (0.00 ns/i) -     39.570Q (optimizes away)
720.391k (±17.3%) i/s    (1.39 μs/i) -      3.522M in   5.050059s

Pathname#directory?
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.directory? } }' 
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
382.448k (± 0.3%) i/s    (2.61 μs/i) -      1.915M in   5.006863s
371.236k (± 0.4%) i/s    (2.69 μs/i) -      1.874M in   5.046993s
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.directory? } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
388.278k (± 0.2%) i/s    (2.58 μs/i) -      1.945M in   5.009322s
366.325k (± 0.2%) i/s    (2.73 μs/i) -      1.843M in   5.030526s
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.directory? } }' 
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
448.926k (± 1.1%) i/s    (2.23 μs/i) -      2.244M in   4.998573s
314.099k (± 2.9%) i/s    (3.18 μs/i) -      1.574M in   5.015517s

Pathname#to_s
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.to_s } }'       
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
6.821M (± 0.7%) i/s  (146.60 ns/i) -     34.758M in   5.095632s
3.683M (± 1.2%) i/s  (271.50 ns/i) -     18.572M in   5.043102s
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.to_s } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
9.480M (± 0.4%) i/s  (105.49 ns/i) -     48.196M in   5.084075s
3.977M (± 1.4%) i/s  (251.46 ns/i) -     20.029M in   5.037328s
$ ruby -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.to_s } }'       
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
31.854Q (±15.5%) i/s    (0.00 ns/i) -     39.901Q (optimizes away)
 1.184M (±13.7%) i/s  (844.61 ns/i) -      5.805M in   5.006740s

* This is just before methods started to be moved from Ruby code to the C extension. * BTW, in the ruby/pathname repository there was no pathname.rb before that commit. (cherry picked from commit 16e97a5)

* This means it's only additions in lib/pathname.rb and zero removals. (cherry picked from commit 3736eab)

(cherry picked from commit 955186c)

* The <=> implementation in the extension is much faster, so is kept. * The other methods are actually faster in Ruby than in C, because rb_funcall() and rb_ivar_get() in C code have no inline cache, but method calls and `@path` have inline caches in Ruby code. https://railsatscale.com/2023-08-29-ruby-outperforms-c/ is an explanation of that (though it was known well before that). (cherry picked from commit c8c2210)

byroot

Looks mostly fine to me besides a few nitpicks, and I'm very much in favor of migrating things to pure Ruby when it makes sense.

Not sure how this works with Pathname having been made a core class though.

lib/pathname.rb

eregon · 2025-07-16T20:27:50Z

@byroot Thank you for the review, I think I addressed all of it.

@hsbt and/or @nobu Could you review this PR as well?

eregon · 2025-07-18T08:46:49Z

@hsbt Let's discuss your concerns and suggestions here.
You said in https://bugs.ruby-lang.org/issues/17473#note-27

Please separate the small PRs. I want to reduce the side effect like ruby/ruby#13906.

Can you make a concrete suggestion by what you mean by small PRs for this change?

I could make a PR with fewer commits, but every commit until Handle Windows NTFS edge case in Pathname#sub_ext is strictly necessary, otherwise the CI doesn't pass.
That leaves only Optimize Pathname#initialize to avoid extra send and Optimize Pathname#initialize to avoid extra ivar accesses which are trivial, and then commits to address @byroot's review.

If you are asking a smaller diff in general I think that is not feasible, e.g. making a PR per method would take months of work and still be the exact same end result. The approach here as detailed in the first commit message, Restore lib/pathname.rb from ext/pathname/lib/pathname.rb at ed9270a is to use the Ruby code of pathname.rb from before the translation to C, that is the code from @akr and other contributors to pathname.rb. There is no meaningful way to break that in smaller changes. That code has already been reviewed, it was exactly the code in Pathname before the translation to C started.

Please take the time to read the commit messages, they should make it very clear what I did and what needs deeper review (e.g. imported code from the gem as-is doesn't).
Just browsing through the commit messages should also make it clear I took great care to have a very clear git history of the changes here with not a single extra line of diff.

tenderlove

I've read through the C implementation and the Ruby implementation. I only found one slight difference, but I think it's such a rare edge case I don't know if we need to support it.

tenderlove · 2025-08-04T21:06:40Z

lib/pathname.rb

+  # If +path+ contains a NUL character (<tt>\0</tt>), an ArgumentError is raised.
+  #
+  def initialize(path)
+    path = path.to_path if path.respond_to? :to_path


I think this is slightly different behavior than the original.

require "pathname" path = "/" def path.to_path "/tmp" end pn = Pathname.new path p pn

On master, the output is #<Pathname:/>, but on this branch, the output is #<Pathname:/tmp>

That looks to me like an optimization in the C extension to avoid the rb_check_funcall() because that's slow.
But in Ruby respond_to? is properly cached and so there is no need for this manual optimization.
Semantics-wise I think it's more consistent the way the original Ruby code does it.
It's of course trivial to change if we need those semantics for some reason.

Semantics-wise I think it's more consistent the way the original Ruby code does it.

Yes, I agree this behavior makes more sense. I just wanted to point out the difference in case it matters (I don't think it should matter).

Done in 96000f6 to be fully compatible with what the C extension initialize did.

(cherry picked from commit a15c1f5)

(cherry picked from commit fe027ae)

* Avoids a MatchData allocation. (cherry picked from commit 643585a)

(cherry picked from commit 177a86d)

(cherry picked from commit f8e0cae)

(cherry picked from commit aa4d4c6)

(cherry picked from commit c96b559)

…assed too * Core methods regularly gain new keyword arguments so this is more future-proof.

Co-authored-by: Jean Boussier <[email protected]>

* The eval to set $~ is inneficient, so only do it when necessary (when running without the C extension).

eregon · 2025-08-05T20:31:06Z

I added the last 4 commits to add TruffleRuby and JRuby in CI, given the diff for it is pretty small, and that ensures all Ruby methods are tested, notably the ones also defined in the C extension. The C extension is then only used on CRuby.

If reviewers prefer that to be a separate PR I can do that, it's easy to move these 4 commits.
I just think it makes sense together.

eregon · 2025-08-05T20:58:14Z

I also updated the description to add a table summarizing the performance gains, and filed https://bugs.ruby-lang.org/issues/21532 to make a ticket specifically about this.

headius · 2025-08-21T15:59:43Z

Huge +1 from the JRuby side. Pathname moving to C was unnecessary then and a hassle for all of us now (including CRuby due to JIT+C interaction. Ship it!

I'll direct any additional comments to the ruby-lang issue.

headius · 2025-08-21T16:12:14Z

Ok, I went to the ruby-lang issue but the last couple of comments directed me back here.

Some specific points:

Pathname becoming core, what happens to the gem?

Seems clear that it would not be gem-upgradeable anymore unless something has changed about how we define "core". If "core" means "loaded always at startup with or without gems enabled", then by definition it can't be upgrade by RubyGems. If "core" means "available without an explicit require" then we're moving toward a future where core features might trigger requires, potentially through RubyGems; that seems very problematic to me and very un-"core".

In my mind, nothing "core" should break if you disable gems. In fact, I believe nothing "core" should break if there's no stdlib available (Ruby should be able to run "hello world" without loading any stdlib files).

Pathname becoming core, how can it be pure-Ruby?

JRuby and TruffleRuby have large parts of core written in Ruby. That code is simply loaded at startup, either by using load in JRuby (to avoid adding LOADED_FEATURES, accessing the stdlib, and exposing those internal files) or by TruffleRuby directly executing them during startup (details left to @eregon. The pathname.rb would move into the "kernel" of each implementation and be loaded at boot just like other Ruby sources currently loaded by CRuby (usually filled with "primitive" C calls that do the actual work).

Why move Pathname to Ruby?

Why was it moved to C? Minor performance improvements? That's moot now that all implementations have JIT and none of those JIT implementations can optimize across C calls. The C move is now a millstone around our necks, both preventing the gem from being usable on non-CRuby and making calls to the library potentially slower than if they were pure Ruby.

...

It sounds like we're almost all in agreement that this should move back to Ruby. If there's anything I can do to assist please let me know. I'd love to see this PR ultimately close #17 and let us align our Pathname functionality with CRuby once more.

Big kudos to @eregon for forging ahead and righting this wrong.

eregon · 2025-08-21T18:34:21Z

Approved by @akr in https://bugs.ruby-lang.org/issues/21532#note-5, merging 🎉

eregon · 2025-08-21T22:25:46Z

@headius Thanks for the support, should have pinged you earlier :)

Ok, I went to the ruby-lang issue but the last couple of comments directed me back here.

I'm not sure from which issue your points come from, is it https://bugs.ruby-lang.org/issues/17473 maybe?

Pathname becoming core, what happens to the gem?

I believe we need to keep it, we need to keep the gem for older Ruby versions anyway.
Also this gem replaces the ::Pathname constant if any when loaded, which seems good if e.g. there would be a bug fix in the gem without a CRuby release.

Pathname becoming core, how can it be pure-Ruby?

Some parts of Pathname were already Ruby, see e.g. https://github.com/ruby/ruby/blob/master/pathname_builtin.rb and lib/pathname.rb before this PR. So not a problem in any case.

eregon added 4 commits July 16, 2025 09:35

Restore lib/pathname.rb from ext/pathname/lib/pathname.rb at ed9270a

b26ed28

* This is just before methods started to be moved from Ruby code to the C extension. * BTW, in the ruby/pathname repository there was no pathname.rb before that commit. (cherry picked from commit 16e97a5)

Restore newer changes in lib/pathname.rb

f3dcde2

* This means it's only additions in lib/pathname.rb and zero removals. (cherry picked from commit 3736eab)

Fixes to pass the test suite

cd5e492

(cherry picked from commit 955186c)

eregon requested review from akr, hsbt, nobu and byroot July 16, 2025 07:43

byroot approved these changes Jul 16, 2025

View reviewed changes

eregon force-pushed the pure-ruby-pathname2 branch from 37ebd64 to 834cc54 Compare July 16, 2025 20:14

tenderlove approved these changes Aug 4, 2025

View reviewed changes

eregon and others added 14 commits August 5, 2025 21:59

Define Pathname#<=> only if the C extension is not loaded

75ecae6

(cherry picked from commit a15c1f5)

Add methods from the C extension which did not exist in pathname.rb

ef267cf

(cherry picked from commit fe027ae)

Use Regexp#match? instead of =~ for better performance

c6abfb7

* Avoids a MatchData allocation. (cherry picked from commit 643585a)

Update the Pathname class documentation with the one in the C extension

e7d8145

(cherry picked from commit 177a86d)

Handle Windows NTFS edge case in Pathname#sub_ext

f8ddafa

(cherry picked from commit f8e0cae)

Optimize Pathname#initialize to avoid extra __send__

da3d8f9

(cherry picked from commit aa4d4c6)

Optimize Pathname#initialize to avoid extra ivar accesses

c5269dc

(cherry picked from commit c96b559)

Avoid ; and multiple statements per line for readability

2480502

Define protected #path to avoid extra copies from #to_s

28add9b

Use (...) for delegating instead of (*args) so kwargs and block are p…

8cca659

…assed too * Core methods regularly gain new keyword arguments so this is more future-proof.

Simplify #unlink

c58bd4f

Co-authored-by: Jean Boussier <[email protected]>

Switch to path.include?("\0") as it is faster than /\0/.match?(path)

054a43e

Define Pathname#sub in C on CRuby for efficiency

20f3653

* The eval to set $~ is inneficient, so only do it when necessary (when running without the C extension).

Do not use the C extension on non-CRuby

9586da6

eregon force-pushed the pure-ruby-pathname2 branch from 249c24d to 20f3653 Compare August 5, 2025 20:16

eregon added 2 commits August 5, 2025 22:26

Small fixes to make all tests pass on TruffleRuby

819607f

Small fixes to make all tests pass on JRuby

dc34c8c

Add TruffleRuby and JRuby in CI

b83d344

eregon mentioned this pull request Aug 5, 2025

Support default & bundled extension gems oracle/truffleruby#2644

Open

26 tasks

eregon merged commit 658648c into ruby:master Aug 21, 2025
20 checks passed

eregon mentioned this pull request Aug 21, 2025

JRuby support #17

Closed

hsbt added a commit to hsbt/ruby that referenced this pull request Aug 22, 2025

Import ruby/pathname#57

67f3477

hsbt mentioned this pull request Aug 22, 2025

Migrate pathname.c to pathname_builtin.rb ruby/ruby#14303

Merged

eregon mentioned this pull request Aug 23, 2025

Run pathname specs from ruby/spec in CI and fix them #60

Merged

hsbt added a commit to ruby/ruby that referenced this pull request Aug 25, 2025

Import ruby/pathname#57

1b4a380

Define most of Pathname in Ruby (redo) #57

Define most of Pathname in Ruby (redo) #57

Uh oh!

Conversation

eregon commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

byroot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eregon commented Jul 16, 2025

Uh oh!

eregon commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenderlove left a comment

Choose a reason for hiding this comment

Uh oh!

tenderlove Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

eregon Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tenderlove Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

eregon Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

eregon commented Aug 5, 2025

Uh oh!

eregon commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

headius commented Aug 21, 2025

Uh oh!

headius commented Aug 21, 2025

Uh oh!

eregon commented Aug 21, 2025

Uh oh!

Uh oh!

eregon commented Aug 21, 2025

Uh oh!

Uh oh!

eregon commented Jul 16, 2025 •

edited

Loading

eregon commented Jul 18, 2025 •

edited

Loading

eregon Aug 5, 2025 •

edited

Loading

eregon commented Aug 5, 2025 •

edited

Loading