Skip to content

VM Crash while running 'pub-get' on MacOS #29539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkustermann opened this issue May 3, 2017 · 5 comments
Closed

VM Crash while running 'pub-get' on MacOS #29539

mkustermann opened this issue May 3, 2017 · 5 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@mkustermann
Copy link
Member

Flutter noticed sporadic crashes of pub get -- see flutter/flutter#9727. The pub get process seems to exit with exitcode -4 which translates to illegal instruction.

@jakobr-google Has reproduced it on his machine by running it in a loop and we have the dart-sdk and a corresponding coredump at (release mode):

gs://dart-temp-crash-archive/jakob-pub-get-crash-core
gs://dart-temp-crash-archive/jakob-pub-get-crash-dart-sdk.tar.bz2

The Dart SDK version is 1.23.0-dev.11.11 .

It seems to crash during a fork() inside the MacOS libc/platform library.

@jakobr-google Will try to reproduce it again with bleeding edge, to see if the underlying issue might have already been fixed.

/cc @zanderso @a-siva @mraleph

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels May 3, 2017
@mraleph
Copy link
Member

mraleph commented May 3, 2017

I think the problem is that Mac's library called libnotify is not fork safe.

This library has a lazily initialized global state, which is created whenever _notify_globals function is called. Lazy initialization is guarded by a lock: initialization and guarding is done using os_alloc_once with key OS_ALLOC_ONCE_KEY_LIBSYSTEM_NOTIFY.

There is no special handling of this state in the prefork routines.

Post fork in a child libSystem_atfork_child invokes _notify_fork_child which starts by getting global library state via _notify_globals.

The absence of special handling in the prefork makes the following race possible:

  • The first thread starts initializing libnotify state - takes the lock.
  • The second thread forks - while libnotify state creation is in progress and OS_ALLOC_ONCE_KEY_LIBSYSTEM_NOTIFY slot is locked.

Because prefork does nothing to ensure that forking thread owns OS_ALLOC_ONCE_KEY_LIBSYSTEM_NOTIFY slot we arrive in the child with a OS_ALLOC_ONCE_KEY_LIBSYSTEM_NOTIFY slot locked by a non-existent thread, so when _os_alloc_once implementation attempts to wait on this lock it gets EOWNERDEAD error back and crashes with the following stack trace:

    frame #0: 0x00007fffd18bdc87 libsystem_platform.dylib`_os_once_gate_corruption_abort + 23
    frame #1: 0x00007fffd18bd99b libsystem_platform.dylib`_os_once_gate_wait_slow + 134
    frame #2: 0x00007fffd18b9a92 libsystem_platform.dylib`_os_alloc_once + 40
    frame #3: 0x00007fffd18b30ae libsystem_notify.dylib`_notify_fork_child + 215
    frame #4: 0x00007fffd0104b21 libSystem.B.dylib`libSystem_atfork_child + 49
    frame #5: 0x00007fffd16f1437 libsystem_c.dylib`fork + 47

As a workaround we could just force initialize libnotify state before forking.

I did an experiment to check when do we potentially initialize it - and it turns out that asking for local time initializes it:

 [0x00007fffbea81a92] _os_alloc_once
  [0x00007fffbea7c57a] notify_register_check
  [0x00007fffbe8fe3e5] notify_register_tz
  [0x00007fffbe8fde43] tzsetwall_basic
  [0x00007fffbe8ff844] localtime_r
  [0x0000000100337149] dart::OS::GetTimeZoneOffsetInSeconds(long long)
  [0x000000010007137d] dart::BootstrapNatives::DN_DateTime_timeZoneOffsetInSeconds(_Dart_NativeArguments*)
  [0x0000000100294563] dart::NativeEntry::LinkNativeCall(_Dart_NativeArguments*)
  [0x0000000100cc0719] [Stub] CallBootstrapCFunction
  [0x000000010294fa47] DateTime._timeZoneOffsetInSecondsForClampedSeconds
  [0x00000001029502e8] DateTime._timeZoneOffsetInSeconds
  [0x000000010294de6a] DateTime._localDateInUtcMicros
  [0x00000001029500c1] DateTime._parts
  [0x000000010294f8a4] DateTime.month

@mraleph
Copy link
Member

mraleph commented May 3, 2017

I filed a bug to Apple: 31962059

@a-siva
Copy link
Contributor

a-siva commented May 3, 2017

Your suggested workaround is to call OS::LocalTIme in the mac version of OS::InitOnce ?

@mraleph
Copy link
Member

mraleph commented May 3, 2017

@a-siva either that or call it right before forking.

We tried this sort of workaround with @jakobr-google and it seems that it fixed the problem.

@a-siva a-siva closed this as completed in 9b14120 May 4, 2017
@mraleph
Copy link
Member

mraleph commented May 5, 2017

I wrote and submitted the following reproduction to Apple:

https://gist.github.com/mraleph/e45db4d7a56bf65c3fe33e666bc31928

jakobr-google added a commit to jakobr-google/flutter that referenced this issue May 5, 2017
Eagerly initialize libnotify by accessing the current date. See dart-lang/sdk#29539 for details.

Fixes flutter#9727.
jakobr-google added a commit to flutter/flutter that referenced this issue May 5, 2017
Eagerly initialize libnotify by accessing the current date. See dart-lang/sdk#29539 for details.

Fixes #9727.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

3 participants