Skip to content

Conversation

eregon
Copy link
Member

@eregon eregon commented Jun 18, 2025

Once upon a time, Pathname was pure-Ruby: https://github.com/ruby/ruby/blob/95bc02237635d3fe42532bfe53038257575cee75/lib/pathname.rb

This PR goes back to that, but keeps the C extension implementation of <=> as that one is significantly faster.
The other Pathname methods are actually faster in Ruby than in C, because all these methods just do rb_funcall() and rb_ivar_get() and those in C code have no inline cache, but the corresponding method calls and @path have inline caches in Ruby code.
https://railsatscale.com/2023-08-29-ruby-outperforms-c/ is an explanation of that (though it was known well before that).

I have discussed this with @akr several times (notably in https://bugs.ruby-lang.org/issues/17473) and the last time he said it was OK to do this change.
The main goals are:

  • Simplify the implementation, e.g. the Ruby version is 3 times smaller in terms of lines and is much easier to read and maintain.
  • Share more of the Pathname implementation between Ruby implementations. Other Ruby implementations can then easily be added in CI later. Currently the pathname gem does not work on JRuby (no C ext support) and on TruffleRuby (some Ruby C API functions that this gem uses are not supported), this will be a huge help towards supporting both.

From my discussions with @akr, IIRC, the original motivation to rewrite pathname.rb to C, besides the optimization for <=>, was apparently to use *at functions like openat (see man openat, Rationale for openat() and other directory file descriptor APIs) but these are not portable, it did not happen, and is only useful in very rare edge cases.
The Ruby Dir class could potentially support some of that, but it seems it has never been important enough for someone to implement it.
The API of Pathname would anyway also need to change to take advantage of a working directory different than the process CWD, e.g. Pathname methods would need to take an extra "Pathname to use as working directory" argument.
(because if one just uses Pathname("relative/path").open(...) there is no point to use *at() functions).


I worked to make the diff really clean, it only adds lines in lib/pathname.rb and only removes lines in ext/pathname/pathname.c. That way it should be easy to review it.
I restored the Ruby implementation of the methods from ed9270a, the commit just before methods started being migrated to the C extension.
I then fixed things to make the test suite pass and implemented the few missing methods based on their C definition.
The individual commits and their messages make it clear what exactly happened, so I would recommend to review commit-by-commit.

@eregon eregon requested a review from akr June 18, 2025 21:49
@eregon eregon force-pushed the pure-ruby-pathname branch 3 times, most recently from ef3c4ab to 9e3c777 Compare June 18, 2025 22:33
@MatheusRich
Copy link

@eregon out of curiosity, do you have benchmarks for this change? Or does it stay mostly the same performance?

@eregon
Copy link
Member Author

eregon commented Jun 19, 2025

It's significantly faster (first line is this branch, second line is master):

Structure:
benchmark name
command line
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
this branch
master
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
this branch
master

Pathname.new(".")
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'Benchmark.ips { it.report { Pathname.new(".") } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
2.093M (± 0.9%) i/s  (477.76 ns/i) -     10.622M in   5.075014s
1.762M (± 0.6%) i/s  (567.54 ns/i) -      8.858M in   5.027444s
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
 32.078Q (±15.6%) i/s    (0.00 ns/i) -     39.570Q (optimizes away)
720.391k (±17.3%) i/s    (1.39 μs/i) -      3.522M in   5.050059s

Pathname#directory?
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.directory? } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
388.278k (± 0.2%) i/s    (2.58 μs/i) -      1.945M in   5.009322s
366.325k (± 0.2%) i/s    (2.73 μs/i) -      1.843M in   5.030526s
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
448.926k (± 1.1%) i/s    (2.23 μs/i) -      2.244M in   4.998573s
314.099k (± 2.9%) i/s    (3.18 μs/i) -      1.574M in   5.015517s

Pathname#to_s
$ ruby --yjit -Ilib -rpathname -rbenchmark/ips -e 'P = Pathname.pwd; Benchmark.ips { it.report { P.to_s } }'
ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +YJIT +PRISM [x86_64-linux]
9.480M (± 0.4%) i/s  (105.49 ns/i) -     48.196M in   5.084075s
3.977M (± 1.4%) i/s  (251.46 ns/i) -     20.029M in   5.037328s
truffleruby 24.2.1, like ruby 3.3.7, Oracle GraalVM JVM [x86_64-linux]
31.854Q (±15.5%) i/s    (0.00 ns/i) -     39.901Q (optimizes away)
 1.184M (±13.7%) i/s  (844.61 ns/i) -      5.805M in   5.006740s

@byroot
Copy link
Member

byroot commented Jun 24, 2025

A bit of a nitpick, but I think static VALUE rb_cPathname; could just be a local variable in Init_pathname now.

Even better, if you require pathname.so at the end of the file, you can use rb_const_get to lookup Pathname so that it's not forever pinned by rb_define_class.

@byroot
Copy link
Member

byroot commented Jun 24, 2025

Nevermind, I missed the rb_obj_is_kind_of(other, rb_cPathname) in path_cmp.

@eregon
Copy link
Member Author

eregon commented Jun 24, 2025

Link: eregon#1 is a nice follow-up by @byroot to reduce even more the logic in C.
I would like to merge this first though to keep it as only deletions in C and only additions in Ruby.

eregon added 3 commits July 15, 2025 21:08
* This is just before methods started to be moved from Ruby code to the C extension.
* BTW, in the ruby/pathname repository there was no pathname.rb before that commit.
* This means it's only additions in lib/pathname.rb and zero removals.
@eregon eregon force-pushed the pure-ruby-pathname branch from 9e3c777 to b868d69 Compare July 15, 2025 19:10
@eregon eregon force-pushed the pure-ruby-pathname branch from b868d69 to c96b559 Compare July 15, 2025 19:14
@eregon
Copy link
Member Author

eregon commented Jul 15, 2025

I am merging this, I don't want this work to go to waste, the code is clearly more maintainable and readable as Ruby instead of C and it's even faster.
@akr had a month to review it, he didn't, and other committers are merging PRs or pushing directly to master, so I think it's fair to merge this myself.

@eregon eregon merged commit 9f6ad02 into ruby:master Jul 15, 2025
14 checks passed
hsbt added a commit that referenced this pull request Jul 15, 2025
This reverts commit 9f6ad02, reversing
changes made to af973b0.
@byroot
Copy link
Member

byroot commented Jul 16, 2025

In the last meeting it has been decided to make Pathname a builtin, hence this repo/gem will become a noop.

IMO this change is good, but I think it has to happen in core?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants