Skip to content

Test suite stuck on GC.verify_compaction_references on ppc64le #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jackorp opened this issue May 6, 2022 · 9 comments
Closed

Test suite stuck on GC.verify_compaction_references on ppc64le #1261

jackorp opened this issue May 6, 2022 · 9 comments

Comments

@jackorp
Copy link
Contributor

jackorp commented May 6, 2022

While trying to execute the test suite of mysql2 gem on Fedora's ppc64le builders with Ruby 3.1, it gets stuck on spec_helper.rb.

This is fixed by commenting out this:

GC.verify_compaction_references(double_heap: true, toward: :empty)

This seems related to ged/ruby-pg#423 and https://bugs.ruby-lang.org/issues/18560 .

@jackorp
Copy link
Contributor Author

jackorp commented May 6, 2022

@junaruga
Copy link
Contributor

junaruga commented May 9, 2022

While trying to execute the test suite of mysql2 gem on Fedora's ppc64le builders with Ruby 3.1, it gets stuck on spec_helper.rb.

It's weird. According to [1], GC.verify_compaction_references in Ruby 3.1 should raise NotImplementedError on the platforms that cannot support GC compaction like ppc64le, rather than stuck. The patch [2] is applied to Ruby 3.1.

[1] https://bugs.ruby-lang.org/issues/18560#note-1
[2] ruby/ruby@fc832ff

@tenderlove
Copy link
Collaborator

Also it should definitely crash. MySQL2 isn't compaction friendly yet, we need to introduce this PR: #1192 and possibly other changes

@tenderlove
Copy link
Collaborator

Actually I'm totally wrong. We just pin references in mysql2, so everything should work correctly. 🤔

@jackorp
Copy link
Contributor Author

jackorp commented May 10, 2022

I was able to request a ppc64le machine and I think I reproduced the issue. TL;DR it seems like a Ruby 3.1 issue with how this method is implemented.

My only guess is that calling GC.verify_compaction_references(double_heap: true, toward: :empty), like this test suite does, makes the GC code travel in some paths it should not have and then it is stuck in an infinite loop around while loop [0] in newobj_slowpath when attempting to allocate new space (I have seen only 2 allocation methods for String and Array in C backtraces I investigated so far).

To the issue itself. Commenting out the call to GC "fixes" it but if I rescue it instead then Ruby goes into infinite loop around GC code with ractors AFAICT.

Rescuing the exception (as in the following patch) and just going on with the code makes the issue surface in this test suite:

diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb
index 2e86e11..a9ccacf 100644
--- a/spec/spec_helper.rb
+++ b/spec/spec_helper.rb
@@ -7,7 +7,11 @@ DatabaseCredentials = YAML.load_file('spec/configuration.yml')
 if GC.respond_to?(:verify_compaction_references)
   # This method was added in Ruby 3.0.0. Calling it this way asks the GC to
   # move objects around, helping to find object movement bugs.
-  GC.verify_compaction_references(double_heap: true, toward: :empty)
+  begin
+    GC.verify_compaction_references(double_heap: true, toward: :empty)
+  rescue NotImplementedError
+    puts 'compaction not supported (caught exception)'
+  end
 end
 
 RSpec.configure do |config|

GDB Backtrace: https://gist.github.com/jackorp/cc5c4bae9b8f492f5b936cdbd86febdc

[0] The infinite loop occurs in this while loop: https://github.com/ruby/ruby/blob/v3_1_2/gc.c#L2483

@junaruga
Copy link
Contributor

Thanks for your great investigation! Does this stuck even happen by the following command on the ppc64le machine you tested?

$ ruby -e 'GC.verify_compaction_references(double_heap: true, toward: :empty)'

@jackorp
Copy link
Contributor Author

jackorp commented May 11, 2022

Does this stuck even happen by the following command on the ppc64le machine you tested?

Unfortunately, I had great issues reproducing the infinite loop outside of the test suite (Maybe somewhere it loads bits that allow for observing it).

As a note, the code:

$ ruby -e 'GC.verify_compaction_references(double_heap: true, toward: :empty)'

would fail with a proper exception. some begin/rescue block around that code piece is needed to even attempt a reproducer.

I was sometimes successful in getting it stuck in IRB but that was very unreliable and mostly random (but running the GC... method there was still required).

@junaruga
Copy link
Contributor

Okay. Thanks for the info. If we can find a minimal reproducer, it's helpful to report it to the Ruby project, and for someone to fix the issue and add a unit test in the Ruby project.

@jackorp
Copy link
Contributor Author

jackorp commented Oct 26, 2023

Let's close this, the problem was addressed in Ruby itself sometime back in https://bugs.ruby-lang.org/issues/18829

AFAICT, this was fixed upstream in newer Rubies (at least 3.2 onward).

JFTR, in fedora we have backported the patches onto 3.1 via the following commits:
https://src.fedoraproject.org/rpms/ruby/c/b7b547379654b3a337010d15914139e158e59acb
https://src.fedoraproject.org/rpms/ruby/c/ca94aff023c5779dec1e03094784bdf736beca83

@jackorp jackorp closed this as completed Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants