Skip to content

Crash in GC on arm64 #39090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkustermann opened this issue Oct 24, 2019 · 5 comments
Closed

Crash in GC on arm64 #39090

mkustermann opened this issue Oct 24, 2019 · 5 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening

Comments

@mkustermann
Copy link
Member

During gardening I noticed there are flaky crashes on arm64.

From this build with this log:

/=============================================================================================\
| corelib_2/regexp/unicode-regexp-restricted-syntax_test broke (Pass -> Crash, expected Pass) |
\=============================================================================================/

--- Command "vm" (took 01.000983s):
DART_CONFIGURATION=ReleaseXARM64 out/ReleaseXARM64/dart --ignore-unrecognized-flags --packages=/b/s/w/ir/.packages /b/s/w/ir/tests/corelib_2/regexp/unicode-regexp-restricted-syntax_test.dart

exit code:
-6

stderr:
===== CRASH =====
si_signo=Segmentation fault(11), si_code=2, si_addr=0xffff89900000
version=2.6.0-edge.63d524dec0ccab8d608c08d3ef210a461d5f8b0d (Thu Oct 24 09:23:06 2019 +0000) on "linux_arm64"
thread=16355, isolate=kernel-service(0xaaaae33b6400)
  pc 0x0000aaaac48f99a4 fp 0x0000ffff8a93e710 out/ReleaseXARM64/dart+0x1a4b9a4
  pc 0x0000aaaac472befc fp 0x0000ffff8a93e760 dart::RawObject::VisitPointersPredefined(dart::ObjectPointerVisitor*, long)
  pc 0x0000aaaac48f9cc0 fp 0x0000ffff8a93e790 out/ReleaseXARM64/dart+0x1a4bcc0
  pc 0x0000aaaac48f9b1c fp 0x0000ffff8a93e800 out/ReleaseXARM64/dart+0x1a4bb1c
  pc 0x0000aaaac479a0fc fp 0x0000ffff8a93e840 dart::ThreadPool::Worker::Loop()
  pc 0x0000aaaac4799ef8 fp 0x0000ffff8a93e880 dart::ThreadPool::Worker::Main(unsigned long)
  pc 0x0000aaaac471adfc fp 0x0000ffff8a93e930 out/ReleaseXARM64/dart+0x186cdfc
  pc 0x0000ffff8bb30fc4 fp 0x0000ffff8a93e940 /lib/aarch64-linux-gnu/libpthread.so.0+0x6fc4
-- End of DumpStackTrace

--- Re-run this test:
python tools/test.py -n dartk-linux-release-arm64 corelib_2/regexp/unicode-regexp-restricted-syntax_test

core dumps are available - see the "isolated out" in the build mentioned above.

(There are more, e.g. this build with log, or this )

/cc @a-siva @rmacnak-google @mraleph

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening labels Oct 24, 2019
@mraleph
Copy link
Member

mraleph commented Oct 24, 2019

Is there a way for us to check if this is something new - or whether it has been going on for a while?

I wonder if this might be related to some of the ARM64 specific changes that we have laded (contributed by ARM)

@rmacnak-google
Copy link
Contributor

I looked at two, and in both the concurrent marker is visiting small variable length objects (Array or TypeArguments), whose header and length fields agree that the object is small, but the marker is visiting slots as if the object was large and falling off the end of a page. In fact, it is visiting the object as if the object's length in elements was equal to the address of null, so this might be a data race on the length field. This is strange because the atomics to enter a safepoint should be preventing such a data race.

https://github.com/dart-lang/sdk/blob/master/runtime/docs/gc.md#data-races

@mraleph
Copy link
Member

mraleph commented Oct 25, 2019

If this started recently this might also be related to the recent atomic changes.

@mraleph
Copy link
Member

mraleph commented Oct 25, 2019

@rmacnak-google I have a question about this code. GC docs say:

For old-space objects created after marking started, the marker may see uninitialized values because operations on slots are not synchronized. To prevent this, during marking we allocate old-space objects black (marked) so the marker will not visit them.

which is implemented by this code

  NoSafepointScope no_safepoint;
  InitializeObject(address, cls_id, size);
  RawObject* raw_obj = reinterpret_cast<RawObject*>(address + kHeapObjectTag);
  ASSERT(cls_id == RawObject::ClassIdTag::decode(raw_obj->ptr()->tags_));
  if (raw_obj->IsOldObject() && thread->is_marking()) {
    // Black allocation. Prevents a data race between the mutator and concurrent
    // marker on ARM and ARM64 (the marker may observe a publishing store of
    // this object before the stores that initialize its slots), and helps the
    // collection to finish sooner.
    raw_obj->SetMarkBitUnsynchronized();
    heap->old_space()->AllocateBlack(size);
  }
  return raw_obj;

I find a bit strange that we use SetMarkBitUnsynchronized to mark the object black and it is not followed by any sort of barrier. This means that marking object black might be reordered with publishing this object.

When I look into AllocateBlack I see that since f4f0831 it uses relaxed memory ordering - previously it used __sync_fetch_and_add which emits the full barrier.

I think this might the root cause of the issue.

dart-bot pushed a commit that referenced this issue Oct 25, 2019
… old object from being ordered after publishing stores of that object.

Compare f4f0831, which implicitly removed a barrier from PageSpace::AllocateBlack.

Bug: #39090
Change-Id: I3a9274b90875a6860cb102a238b6f95a6a7b2ff2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/123004
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Ryan Macnak <[email protected]>
@mraleph
Copy link
Member

mraleph commented Oct 28, 2019

I think @rmacnak-google change has fixed it - I don't see any more instances of it in the flakiness dashboard.

dart-bot pushed a commit that referenced this issue Nov 11, 2019
… old object from being ordered after publishing stores of that object.

Compare f4f0831, which implicitly removed a barrier from PageSpace::AllocateBlack.

Bug: #39090
Change-Id: I3a9274b90875a6860cb102a238b6f95a6a7b2ff2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/123004
Reviewed-by: Alexander Markov <[email protected]>
Commit-Queue: Ryan Macnak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. gardening
Projects
None yet
Development

No branches or pull requests

3 participants