-
Notifications
You must be signed in to change notification settings - Fork 78
Description
TL;DR: Some VMs (CRuby, ART, etc.) support forking, but fork()
doesn't duplicate any threads other than the one that calls fork()
. Currently, if a VM calls fork()
, MMTk GC threads will not exist in the child process. We need to have the necessary mechanisms to support fork()
.
Requirement
CRuby
Ruby has the method Kernel#fork
. It does what the fork()
system call does for Ruby, i.e. duplicates the current process, but only the current Ruby thread, not other threads.
Shopify's use case involves forking the VM to handle different requests. The Ruby process performs a compacting GC before forking so that the heap is less fragmented for the children. This is not a problem because CRuby's own GC does GC in the same mutator thread. In other words, it doesn't have dedicated GC threads.
When using MMTk, after forking, the child process will not have any GC thread. If a mutator thread in the child process triggers a GC, it will block forever for the GC to finish. But GC will never happen because there is no GC thread.
Android ART
The "Zygote" process runs an ART VM, and forks into different application processes. This is intended for accelerating class loading.
We will face the same problem if the Zygote process forks.
What should happen when forking?
We first need to let GC threads come to a graceful stop. We can only fork()
when no GC thread is running.
We also need to make sure all mutators are at safe point, and all contexts are flushed. After fork()
, only one thread will remain, and that's likely a mutator thread. This means,
- Other mutator threads must not be in a critical section w.r.t. GC. For example, it must be in the middle of allocating and intializing an object, and must not be in the middle of executing a write barrier.
- Other mutators will need to flush their thread-local states. Their mod buffers need to be flushed. For the MiMalloc allocator, mutators need to give back blocks cached locally. Bump-pointer allocators can be discarded as long as they are not in the middle of allocation.
Right before fork()
, all GC threads must stop. After fork()
, we should restart GC threads. We can ignore the coordinator thread for now because we plan to remove it (we'll discuss that in #1053). The states of a GC worker is encapsulated in the GCWorker
struct, so it should be easy to restart GC threads by reusing the GCWorker
structs.
What needs to be done?
Everything will be easier if we remove the coordinator first. See #1053
We need to add an API to stop all GC threads for forking. It is basically the reverse of initialize_collection
.
We need another API to restart GC threads. It should be similiar to initialize_collection
, but it should reuse the existing GCWorker
structs rather than creating new instances.
We need to further make sure that GC worker threads save all states in the GCWorker
struct before exiting.