-
Notifications
You must be signed in to change notification settings - Fork 190
[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7320 +/- ##
==========================================
+ Coverage 41.25% 41.27% +0.01%
==========================================
Files 2193 2196 +3
Lines 192006 192119 +113
==========================================
+ Hits 79210 79293 +83
- Misses 106189 106212 +23
- Partials 6607 6614 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Currently, implementations of storage Reader and Iterator read data from one database (BadgerDB or Pebble). However, we need to access data from multiple databases after switching storage from BadgerDB to Pebble (without data migration). This commit adds new implementations of Reader and Iterator to support querying and iterating data across databases: - multiReader - multiIterator The NewMultiReader function returns a multiReader that consists of multiple readers in the provided order. Readers are read sequentially until: - a reader succeeds or - a reader returns an error that is not ErrNotFound If all readers return ErrNotFound, Reader.Get will return ErrNotFound. The NewMultiIterator function returns a multiIterator that is a logical concatenation of multiple iterators in the provided sequence. The returned iterator iterates items in the first iterator, and then iterates items in the second iterator, etc.
46c809c
to
5fdcb99
Compare
storage/operation/multi_reader.go
Outdated
// - a reader succeeds or | ||
// - a reader returns an error that is not ErrNotFound | ||
// If all readers return ErrNotFound, Reader.Get will return ErrNotFound. | ||
func NewMultiReader(readers ...storage.Reader) storage.Reader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the chained storage modules to handle the case, it's just handling at one layer above (the store layer). I wonder what would be the different usage case here that we want to add the multi reader at a lower layer (the operations layer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangchiqing Before opening this PR, I tried using chained storage modules first, but it had some limitations. So I implemented multiReader
and multiIterator
at a operations level (lower layer).
Access nodes need both read and write operations for transactions and collections.
If we implement multi reader at store layer (higher layer) as in chained storage modules, we need to add multi reader for every store, such as transactions store and collections store. We also need to add TransactionsReader
and CollectionsReader
fields to FlowAccessNodeBuilder
. More importantly, we need to make sure to use multi reader (instead of regular store like Transactions
) to query data.
On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader (see PR #7321). It leads to less duplicate code, easier testing, and easier integration, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can see your points. The changes from PR 7321 does make much less change.
On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader
Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.
Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.
Yes, we will need to add TransactionsReader
and CollectionsReader
fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.
I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.
storage/operation/multi_reader.go
Outdated
// - a reader succeeds or | ||
// - a reader returns an error that is not ErrNotFound | ||
// If all readers return ErrNotFound, Reader.Get will return ErrNotFound. | ||
func NewMultiReader(readers ...storage.Reader) storage.Reader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can see your points. The changes from PR 7321 does make much less change.
On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader
Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.
Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.
Yes, we will need to add TransactionsReader
and CollectionsReader
fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.
I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.
I added |
@peterargue PTAL 🙏 For context, @zhangchiqing and I had meeting about this PR last Friday (April 18), mostly regarding #7320 (comment). Leo and I agree with the approach taken by this PR to provide the functionality at lower level. Some extra benefits discovered during our meeting include this PR only needed 1 cache compared to alternative approach. Also, this approach enables Given this involves Access Node, Leo and I would like to see if the approach taken by this PR works for you too before proceeding. |
Input validity checking happens when new iterators are created so the redundant check isn't needed.
This commit makes NewMultiReader return error early if caller specifies no readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Faye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great. thanks @fxamacker
Updates issue #6523, #6527, required by PR #7321
Currently, implementations of storage
Reader
,Iterator
, andSeeker
read data from one database (BadgerDB or Pebble).However, we need to access data from multiple databases after switching storage from BadgerDB to Pebble if there is no data migration. Without migration, data can be in either BadgerDB or Pebble so it would be useful to try both databases.
This PR adds new implementations of
Reader
,Iterator
, andSeeker
to support querying and iterating data across databases:multiReader
multiIterator
multiSeeker
The
NewMultiReader
function returns amultiReader
that consists of multiple readers in the provided order. Readers are read sequentially until:ErrNotFound
If all readers return
ErrNotFound
,Reader.Get
will returnErrNotFound
.The
NewMultiIterator
function returns amultiIterator
that is a logical concatenation of multiple iterators in the provided sequence. The returned iterator iterates items in the first iterator, and then iterates items in the second iterator, etc.The
NewMultiSeeker
function returns amultiSeeker
that is a logical concatenation of multiple seekers in the provided sequence. The returned seeker seeks largest key in lexicographical order in the last seeker, and then seeks items in the second seeker, etc.