Skip to content

Firestore Crash - Mutation batchIDs must be acknowledged in order #772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SL62 opened this issue Feb 8, 2018 · 13 comments · Fixed by #790 or #909
Closed

Firestore Crash - Mutation batchIDs must be acknowledged in order #772

SL62 opened this issue Feb 8, 2018 · 13 comments · Fixed by #790 or #909
Assignees

Comments

@SL62
Copy link

SL62 commented Feb 8, 2018

[REQUIRED] Step 2: Describe your environment

  • Xcode version: 9.2
  • Firebase SDK version: 4.8.1
  • Firebase Component: Firestore
  • Component version: 0.10.0

[REQUIRED] Step 3: Describe the problem

Assertion failure in -[FSTLevelDBMutationQueue acknowledgeBatch:streamToken:group:], third_party/firebase/ios/Source/Firestore/Source/Local/FSTLevelDBMutationQueue.mm:235

Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'FIRESTORE INTERNAL ASSERTION FAILED: Mutation batchIDs must be acknowledged in order'

Steps to reproduce:

Using the new disableNetwork() and enableNetwork() methods under certain scenarios causes the app to crash.

Yet to determine the exact scenarios, but it seems to occur when calling disableNetwork() when the app is in the middle of reading data from the web or writing data.
Moving the disableNetwork() method to an earlier part of the code appears to eliminate / reduce the frequency of crashes.

Logs:

* thread #29, queue = 'com.google.firebase.firestore', stop reason = signal SIGABRT
  * frame #0: 0x0000000182ee82e8 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000182ffd2f8 libsystem_pthread.dylib`pthread_kill$VARIANT$mp + 396
    frame #2: 0x0000000182e56fbc libsystem_c.dylib`abort + 140
    frame #3: 0x00000001825f4068 libc++abi.dylib`abort_message + 132
    frame #4: 0x00000001825f4210 libc++abi.dylib`default_terminate_handler() + 304
    frame #5: 0x000000018261c810 libobjc.A.dylib`_objc_terminate() + 124
    frame #6: 0x000000018260c54c libc++abi.dylib`std::__terminate(void (*)()) + 16
    frame #7: 0x000000018260bea8 libc++abi.dylib`__cxa_throw + 136
    frame #8: 0x000000018261c65c libobjc.A.dylib`objc_exception_throw + 364
    frame #9: 0x00000001833d3038 CoreFoundation`+[NSException raise:format:arguments:] + 104
    frame #10: 0x0000000183d6d7f4 Foundation`-[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 112
    frame #11: 0x0000000102a48040 <AppName>`-[FSTLevelDBMutationQueue acknowledgeBatch:streamToken:group:] + 480
    frame #12: 0x0000000102a52d2c <AppName>`-[FSTLocalStore acknowledgeBatchWithResult:] + 252
    frame #13: 0x0000000102a3bbb0 <AppName>`-[FSTSyncEngine applySuccessfulWriteWithResult:] + 184
    frame #14: 0x0000000102a73374 <AppName>`-[FSTRemoteStore writeStreamDidReceiveResponseWithVersion:mutationResults:] + 320
    frame #15: 0x0000000102a7ff58 <AppName>`-[FSTWriteStream handleStreamMessage:] + 692
    frame #16: 0x0000000102a7e6c0 <AppName>`__24-[FSTStream writeValue:]_block_invoke + 468
    frame #17: 0x0000000105d712cc libdispatch.dylib`_dispatch_call_block_and_release + 24
    frame #18: 0x0000000105d7128c libdispatch.dylib`_dispatch_client_callout + 16
    frame #19: 0x0000000105d7ff80 libdispatch.dylib`_dispatch_queue_serial_drain + 696
    frame #20: 0x0000000105d747ec libdispatch.dylib`_dispatch_queue_invoke + 332
    frame #21: 0x0000000105d80f6c libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh + 428
    frame #22: 0x0000000105d88020 libdispatch.dylib`_dispatch_workloop_worker_thread + 652
    frame #23: 0x0000000182ffaf1c libsystem_pthread.dylib`_pthread_wqthread + 932
    frame #24: 0x0000000182ffab6c libsystem_pthread.dylib`start_wqthread + 4
@wilhuff
Copy link
Contributor

wilhuff commented Feb 13, 2018

We're tracking this as b/73293239 internally and are looking into this.

@ryanwilson
Copy link
Member

This fix is now published in Firebase 4.10.0.

@SL62
Copy link
Author

SL62 commented Mar 7, 2018

I just updated to the latest Firebase SDK 4.10.0 and unfortunately I'm still getting the crash.
The stack trace (below) is a bit different now.

Could this issue be reopened please?

* thread #32, queue = 'com.google.firebase.firestore', stop reason = signal SIGABRT
    frame #0: 0x0000000184ad02e8 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000184be52f8 libsystem_pthread.dylib`pthread_kill$VARIANT$mp + 396
    frame #2: 0x0000000184a3efbc libsystem_c.dylib`abort + 140
    frame #3: 0x00000001841dc068 libc++abi.dylib`abort_message + 132
    frame #4: 0x00000001841dc210 libc++abi.dylib`default_terminate_handler() + 304
    frame #5: 0x0000000184204810 libobjc.A.dylib`_objc_terminate() + 124
    frame #6: 0x00000001841f454c libc++abi.dylib`std::__terminate(void (*)()) + 16
    frame #7: 0x00000001841f3ea8 libc++abi.dylib`__cxa_throw + 136
    frame #8: 0x000000018420465c libobjc.A.dylib`objc_exception_throw + 364
    frame #9: 0x0000000184fbb038 CoreFoundation`+[NSException raise:format:arguments:] + 104
    frame #10: 0x00000001859557f4 Foundation`-[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 112
    frame #11: 0x0000000104b0be14 AppName`-[FSTLevelDBMutationQueue acknowledgeBatch:streamToken:group:] + 480
    frame #12: 0x0000000104b17d30 AppName`-[FSTLocalStore acknowledgeBatchWithResult:] + 252
    frame #13: 0x0000000104afdc44 AppName`-[FSTSyncEngine applySuccessfulWriteWithResult:] + 172
    frame #14: 0x0000000104b3eccc AppName`-[FSTRemoteStore writeStreamDidReceiveResponseWithVersion:mutationResults:] + 296
    frame #15: 0x0000000104b4e95c AppName`-[FSTWriteStream handleStreamMessage:] + 696
    frame #16: 0x0000000104b4cbf8 AppName`-[FSTStream writeValue:] + 452
    frame #17: 0x0000000104b4a774 AppName`-[FSTCallbackFilter writeValue:] + 88
  * frame #18: 0x0000000106cbcc08 RxLibrary`__57-[GRXConcurrentWriteable enqueueValue:completionHandler:]_block_invoke((null)=<unavailable>) at GRXConcurrentWriteable.m:60
    frame #19: 0x0000000107fd52cc libdispatch.dylib`_dispatch_call_block_and_release + 24
    frame #20: 0x0000000107fd528c libdispatch.dylib`_dispatch_client_callout + 16
    frame #21: 0x0000000107fe3f80 libdispatch.dylib`_dispatch_queue_serial_drain + 696
    frame #22: 0x0000000107fd87ec libdispatch.dylib`_dispatch_queue_invoke + 332
    frame #23: 0x0000000107fe4f6c libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh + 428
    frame #24: 0x0000000107fec020 libdispatch.dylib`_dispatch_workloop_worker_thread + 652
    frame #25: 0x0000000184be2f1c libsystem_pthread.dylib`_pthread_wqthread + 932
    frame #26: 0x0000000184be2b6c libsystem_pthread.dylib`start_wqthread + 4

@SL62
Copy link
Author

SL62 commented Mar 7, 2018

Just to add a bit more info, the crash happens here:

@interface GRXConcurrentWriteable ()
// This is atomic so that cancellation can nillify it from any thread.
@property(atomic, strong) id<GRXWriteable> writeable;
@end

@implementation GRXConcurrentWriteable {
  dispatch_queue_t _writeableQueue;
  // This ensures that writesFinishedWithError: is only sent once to the writeable.
  BOOL _alreadyFinished;
}

- (instancetype)init {
  return [self initWithWriteable:nil];
}

// Designated initializer
- (instancetype)initWithWriteable:(id<GRXWriteable>)writeable
                    dispatchQueue:(dispatch_queue_t)queue {
  if (self = [super init]) {
    _writeableQueue = queue;
    _writeable = writeable;
  }
  return self;
}

- (instancetype)initWithWriteable:(id<GRXWriteable>)writeable {
  return [self initWithWriteable:writeable
                   dispatchQueue:dispatch_get_main_queue()];
}

- (void)enqueueValue:(id)value completionHandler:(void (^)(void))handler {
  dispatch_async(_writeableQueue, ^{
    // We're racing a possible cancellation performed by another thread. To turn all already-
    // enqueued messages into noops, cancellation nillifies the writeable property. If we get it
    // before it's nil, we won the race.
    id<GRXWriteable> writeable = self.writeable;
    if (writeable) {
      [writeable writeValue:value]; //<----------- Crashes at this line
      handler();
    }
  });
}

@mikelehen
Copy link
Contributor

Reopening issue. Would it be possible to capture logs of the issue by adding FirebaseConfiguration.shared.setLoggerLevel(.debug) to your app?

@mikelehen mikelehen reopened this Mar 7, 2018
@SL62
Copy link
Author

SL62 commented Mar 7, 2018

Thanks for reopening.
I've added the logger as requested and I'm getting the following in the debug console:

2018-03-07 16:50:56.860098+0000 AppName[8052:2612896] *** Assertion failure in -[FSTLevelDBMutationQueue acknowledgeBatch:streamToken:group:], third_party/firebase/ios/Source/Firestore/Source/Local/FSTLevelDBMutationQueue.mm:235
2018-03-07 16:50:56.860098+0000 AppName[8052:2612857] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c0314130 response: <GCFSWriteResponse 0x1c04c0620>: {
    stream_token: "\020\001\031\020hB\002\201\364\265\265"
    write_results {
      update_time {
        seconds: 1518214069
        nanos: 122079000
      }
    }
    commit_time {
      seconds: 1520441456
      nanos: 42195000
    }
}
2018-03-07 16:50:56.864805+0000 AppName[8052:2612896] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'FIRESTORE INTERNAL ASSERTION FAILED: Mutation batchIDs must be acknowledged in order'
*** First throw call stack:
(0x184fbb164 0x184204528 0x184fbb038 0x1859557f4 0x104cf3e14 0x104cffd30 0x104ce5c44 0x104d26ccc 0x104d3695c 0x104d34bf8 0x104d32774 0x106e98c08 0x1081b12cc 0x1081b128c 0x1081bff80 0x1081b47ec 0x1081c0f6c 0x1081c8020 0x184be2f1c 0x184be2b6c)
libc++abi.dylib: terminating with uncaught exception of type NSException

Does this help or do you need more of the logs?

@gsoltis
Copy link

gsoltis commented Mar 7, 2018

More of the logs would be helpful, especially around where disableNetwork is called, if you have them.

@SL62
Copy link
Author

SL62 commented Mar 7, 2018

No problem, I've added all logs between calling disableNetwork up until just after the completion handler returns for enableNetwork:

2018-03-07 17:30:16.195321+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWatchStream 0x1c04eca00 stop
2018-03-07 17:30:16.203204+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWatchStream 0x1c04eca00 Performing stream teardown
2018-03-07 17:30:16.212876+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWatchStream 0x1c04eca00 Closing stream client-side
2018-03-07 17:30:16.213454+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c0310e00 stop
2018-03-07 17:30:16.213946+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c0310e00 Performing stream teardown
2018-03-07 17:30:16.214401+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c0310e00 mutation request: <GCFSWriteRequest 0x1c0afaf80>: {
    stream_token: "\020\007\031\020hB\002\201\364\265\265"
}
2018-03-07 17:30:16.215678+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c0310e00 Closing stream client-side
2018-03-07 17:30:16.216123+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Stopping write stream with 0 pending writes
2018-03-07 17:30:16.235158+0000 AppName[8116:2633389] 4.10.0 - [Firebase/Analytics][I-ACS031006] View controller already tracked. Class, ID: InboxVC, -4887308447020283543
2018-03-07 17:30:16.273351+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.278813+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.279138+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.301166+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.301731+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.301981+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] Committing <FSTWriteGroup for Release query: 0 changes (0 bytes):>
2018-03-07 17:30:16.302154+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWatchStream 0x1c08ff900 start
2018-03-07 17:30:16.302321+0000 AppName[8116:2633289] 4.10.0 - [Firebase/Firestore][I-FST000001] FSTWriteStream 0x1c03085e0 start

Between the above and the actual crash, there's about 2000 lines of logs (mostly due to all the active listeners going back online and a few writes being executed).

@mikelehen
Copy link
Contributor

Thanks @SL62. I think we still need more of the logs. Everything in that last snippet looks as-expected. To ensure we understand what's going on, I'd really like /all/ of the Firestore logs leading up to the crash (both before and after the snippet you pasted). I suspect that we're somehow re-sending a write we shouldn't be or else skipping a write. To verify this I'll need to see the preceding writes and their results.

If the logs are too long you could put them in a gist. Or if they contain sensitive data but you're willing to share privately, you can email them to me at [email protected]. Without logs, we'll probably need a concrete way for us to reproduce the error so that we can investigate what's going on.

Thanks and sorry for the inconvenience!

@SL62
Copy link
Author

SL62 commented Mar 7, 2018

No problem, I've sent you an email with the full log as it does contain sensitive data.
I've also discovered another bug with sdk 4.10.0 as mentioned in the email.
Please let me know if you need any more info.

mikelehen pushed a commit to firebase/firebase-js-sdk that referenced this issue Mar 12, 2018
getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled as reported here:
firebase/firebase-ios-sdk#772
@wilhuff
Copy link
Contributor

wilhuff commented Mar 12, 2018

Hi @SL62 we released Firebase 4.10.1 which addresses the memory corruption issue which was the root cause of the bug you were seeing. Let us know how it goes. Thanks for your patience!

@wilhuff wilhuff closed this as completed Mar 12, 2018
@wilhuff wilhuff reopened this Mar 12, 2018
@wilhuff
Copy link
Contributor

wilhuff commented Mar 12, 2018

Whoops. 4.10.1 fixes the secondary issue you emailed about, but we need an additional change to actually address the root cause of this issue.

mikelehen pushed a commit that referenced this issue Mar 12, 2018
Port of: firebase/firebase-js-sdk#559
Fixes #772

getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled.
mikelehen pushed a commit that referenced this issue Mar 12, 2018
Port of: firebase/firebase-js-sdk#559
Fixes #772

getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled.
mikelehen pushed a commit that referenced this issue Mar 12, 2018
Port of: firebase/firebase-js-sdk#559
Fixes #772

getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled.
mikelehen added a commit to firebase/firebase-js-sdk that referenced this issue Mar 12, 2018
…559)

getNextMutationBatchAfterBatchId() was not respecting highestAcknowledgedBatchId and therefore we were resending writes if they had been acknowledged but not removed (aka the held write acks case). This showed up when a user disabled / enabled the network as reported here and I've included a spec test to emulate that case: firebase/firebase-ios-sdk#772
mikelehen added a commit that referenced this issue Mar 12, 2018
…909)

Port of: firebase/firebase-js-sdk#559
Should address #772 once released.

getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled.
@morganchen12 morganchen12 reopened this Mar 12, 2018
@paulb777
Copy link
Member

Fixed in Firebase 4.11.0 FirebaseFirestore 0.10.4

minafarid pushed a commit to minafarid/firebase-ios-sdk that referenced this issue Jun 6, 2018
…irebase#909)

Port of: firebase/firebase-js-sdk#559
Should address firebase#772 once released.

getNextMutationBatchAfterBatchId() was not respecting
highestAcknowledgedBatchId and therefore we were resending writes after the
network was disabled / re-enabled.
@firebase firebase locked and limited conversation to collaborators Nov 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants