-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Port performance optimizations to speed up reading large collections from Android #2140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f09f0c1
c93bf47
884de74
04df500
628dd45
5e0c6ed
3a05d5f
c6074b6
c9e1692
21c37e8
7aaf87d
2e20327
b1d050f
4d86e24
abbd403
560ef0f
dade8c3
afca380
f345746
958b739
96eba87
207ba72
2b7a9ab
e14fe58
83ea0a0
c22879c
005fbe5
99ca679
595e2c1
4252fee
229515f
b9f0188
e48b1cb
40a03d8
cca3650
5a2cd65
1c1a102
5cdd10e
ab023b0
aedad29
9f069f2
2886900
099e99e
69066b4
5481ac4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -264,14 +264,24 @@ - (MaybeDocumentMap)applyRemoteEvent:(FSTRemoteEvent *)remoteEvent { | |
} | ||
} | ||
|
||
// TODO(klimt): This could probably be an NSMutableDictionary. | ||
DocumentKeySet changedDocKeys; | ||
MaybeDocumentMap changedDocs; | ||
const DocumentKeySet &limboDocuments = remoteEvent.limboDocumentChanges; | ||
DocumentKeySet updatedKeys; | ||
for (const auto &kv : remoteEvent.documentUpdates) { | ||
updatedKeys = updatedKeys.insert(kv.first); | ||
} | ||
// Each loop iteration only affects its "own" doc, so it's safe to get all the remote | ||
// documents in advance in a single call. | ||
MaybeDocumentMap existingDocs = [self.remoteDocumentCache entriesForKeys:updatedKeys]; | ||
|
||
for (const auto &kv : remoteEvent.documentUpdates) { | ||
const DocumentKey &key = kv.first; | ||
FSTMaybeDocument *doc = kv.second; | ||
changedDocKeys = changedDocKeys.insert(key); | ||
FSTMaybeDocument *existingDoc = [self.remoteDocumentCache entryForKey:key]; | ||
FSTMaybeDocument *existingDoc = nil; | ||
auto foundExisting = existingDocs.find(key); | ||
if (foundExisting != existingDocs.end()) { | ||
existingDoc = foundExisting->second; | ||
} | ||
|
||
// If a document update isn't authoritative, make sure we don't apply an old document version | ||
// to the remote cache. We make an exception for SnapshotVersion.MIN which can happen for | ||
|
@@ -280,6 +290,7 @@ - (MaybeDocumentMap)applyRemoteEvent:(FSTRemoteEvent *)remoteEvent { | |
(authoritativeUpdates.contains(doc.key) && !existingDoc.hasPendingWrites) || | ||
doc.version >= existingDoc.version) { | ||
[self.remoteDocumentCache addEntry:doc]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Profiler shows that currently ~30% of time is spent here, or more specifically, here: void Put(absl::string_view key, GPBMessage* message) {
NSData* data = [message data]; // THIS LINE; apparently, `data` is pretty time-consuming
std::string key_string(key);
mutations_[key_string] = std::string((const char*)data.bytes, data.length);
version_++;
} I can't see a way to improve this immediately, but I won't be surprised if the nanopb-based serializer will bring a further performance improvement. |
||
changedDocs = changedDocs.insert(key, doc); | ||
} else { | ||
LOG_DEBUG( | ||
"FSTLocalStore Ignoring outdated watch update for %s. " | ||
|
@@ -306,7 +317,7 @@ - (MaybeDocumentMap)applyRemoteEvent:(FSTRemoteEvent *)remoteEvent { | |
[self.queryCache setLastRemoteSnapshotVersion:remoteVersion]; | ||
} | ||
|
||
return [self.localDocuments documentsForKeys:changedDocKeys]; | ||
return [self.localDocuments localViewsForDocuments:changedDocs]; | ||
}); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -438,7 +438,11 @@ - (FSTDocument *)decodedFoundDocument:(GCFSBatchGetDocumentsResponse *)response | |
HARD_ASSERT(version != SnapshotVersion::None(), | ||
"Got a document response with no snapshot version"); | ||
|
||
return [FSTDocument documentWithData:value key:key version:version state:FSTDocumentStateSynced]; | ||
return [FSTDocument documentWithData:value | ||
key:key | ||
version:version | ||
state:FSTDocumentStateSynced | ||
proto:response.found]; | ||
} | ||
|
||
- (FSTDeletedDocument *)decodedDeletedDocument:(GCFSBatchGetDocumentsResponse *)response { | ||
|
@@ -1144,8 +1148,13 @@ - (FSTDocumentWatchChange *)decodedDocumentChange:(GCFSDocumentChange *)change { | |
const DocumentKey key = [self decodedDocumentKey:change.document.name]; | ||
SnapshotVersion version = [self decodedVersion:change.document.updateTime]; | ||
HARD_ASSERT(version != SnapshotVersion::None(), "Got a document change with no snapshot version"); | ||
FSTMaybeDocument *document = | ||
[FSTDocument documentWithData:value key:key version:version state:FSTDocumentStateSynced]; | ||
// The document may soon be re-serialized back to protos in order to store it in local | ||
// persistence. Memoize the encoded form to avoid encoding it again. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Read this: http://go/apidosdonts##2-use-different-messages-for-wire-and-storage However, since this is just an optimization that we could drop if our wire/storage protos diverge, I think we're still ok. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some points:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, we made this choice intentionally, knowing about that doc. In short:
In the steady state we can trust there won't be breaking changes to the protos because unlike google3 we can't impose on customers that they upgrade. The only point where there is some diceyness is around transitions between versions of the protocol. The transition from v1alpha1 to v1beta1 was grossly incompatible, but thankfully at the time we didn't have persistence yet. Note that v1alpha1 was essentially the Cloud Datastore Entity model with streaming bolted on, while v1beta1 was Firestore proper so this was an expected breaking change. The upcoming transition to v1 is a no-op because v1beta1 and v1 are wire compatible. Even if the RPC protocol were to change radically we can likely insist that the Document part remain stable across versions. Regardless, a fork-when-we-need-to strategy has allowed us to defer this work for two years and counting so I'm going to keep betting on it :-). |
||
FSTMaybeDocument *document = [FSTDocument documentWithData:value | ||
key:key | ||
version:version | ||
state:FSTDocumentStateSynced | ||
proto:change.document]; | ||
|
||
NSArray<NSNumber *> *updatedTargetIds = [self decodedIntegerArray:change.targetIdsArray]; | ||
NSArray<NSNumber *> *removedTargetIds = [self decodedIntegerArray:change.removedTargetIdsArray]; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docs claim that if currentKey.Decode() fails, then the instance is left in an undefined state (until the next call to Decode()). So although this probably works, the next call to document_key() on line 94 isn't guaranteed to do what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify when decode key will fail? If we believe it only fails when something corrupts, I'd prefer let the code abort rather than continue and hide until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rich, thanks for spotting it. I removed that line, so this is no longer applicable.
Zhi, my understanding is this:
Seek
will return an iterator that points either at the key that is being sought, or the first key that is larger. However, in leveldb all the keys are stored in a single table, so it's possible thatSeek
will return a key from the next "logical" table. For example, imagine that keys go like this:and we're seeking for
remote_documents/zzz
.Seek
will return an iterator towrite_batches/spam
, which is from a different logical table (by logical, I mean that Firestore uses prefixes like "remote_documents" to treat leveldb as if it had several tables). ThenLevelDbRemoteDocumentKey
will fail toDecode
because it expects table nameremote_documents
, but it's actuallywrite_batches
.