Skip to content

[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 25, 2025

Conversation

fxamacker
Copy link
Member

@fxamacker fxamacker commented Apr 17, 2025

Updates issue #6523, #6527, required by PR #7321

Currently, implementations of storage Reader, Iterator, and Seeker read data from one database (BadgerDB or Pebble).

However, we need to access data from multiple databases after switching storage from BadgerDB to Pebble if there is no data migration. Without migration, data can be in either BadgerDB or Pebble so it would be useful to try both databases.

This PR adds new implementations of Reader, Iterator, and Seeker to support querying and iterating data across databases:

  • multiReader
  • multiIterator
  • multiSeeker

The NewMultiReader function returns a multiReader that consists of multiple readers in the provided order. Readers are read sequentially until:

  • a reader succeeds or
  • a reader returns an error that is not ErrNotFound
    If all readers return ErrNotFound, Reader.Get will return ErrNotFound.

The NewMultiIterator function returns a multiIterator that is a logical concatenation of multiple iterators in the provided sequence. The returned iterator iterates items in the first iterator, and then iterates items in the second iterator, etc.

The NewMultiSeeker function returns a multiSeeker that is a logical concatenation of multiple seekers in the provided sequence. The returned seeker seeks largest key in lexicographical order in the last seeker, and then seeks items in the second seeker, etc.

@fxamacker fxamacker requested review from zhangchiqing and a team April 17, 2025 20:21
@fxamacker fxamacker self-assigned this Apr 17, 2025
@fxamacker fxamacker requested a review from a team as a code owner April 17, 2025 20:21
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2025

Codecov Report

Attention: Patch coverage is 63.11475% with 45 lines in your changes missing coverage. Please review.

Project coverage is 41.27%. Comparing base (252128f) to head (0a65506).

Files with missing lines Patch % Lines
storage/operation/multi_reader.go 66.66% 8 Missing and 4 partials ⚠️
storage/mock/reader.go 0.00% 11 Missing ⚠️
storage/operation/multi_iterator.go 81.25% 6 Missing and 3 partials ⚠️
storage/operation/multi_seeker.go 68.42% 4 Missing and 2 partials ⚠️
storage/operation/reads.go 25.00% 2 Missing and 1 partial ⚠️
storage/operation/badgerimpl/reader.go 0.00% 2 Missing ⚠️
storage/operation/pebbleimpl/reader.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7320      +/-   ##
==========================================
+ Coverage   41.25%   41.27%   +0.01%     
==========================================
  Files        2193     2196       +3     
  Lines      192006   192119     +113     
==========================================
+ Hits        79210    79293      +83     
- Misses     106189   106212      +23     
- Partials     6607     6614       +7     
Flag Coverage Δ
unittests 41.27% <63.11%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Currently, implementations of storage Reader and Iterator
read data from one database (BadgerDB or Pebble).

However, we need to access data from multiple databases
after switching storage from BadgerDB to Pebble (without
data migration).

This commit adds new implementations of Reader and Iterator to
support querying and iterating data across databases:
- multiReader
- multiIterator

The NewMultiReader function returns a multiReader that consists
of multiple readers in the provided order.  Readers are read
sequentially until:
- a reader succeeds or
- a reader returns an error that is not ErrNotFound
If all readers return ErrNotFound, Reader.Get will return ErrNotFound.

The NewMultiIterator function returns a multiIterator that is a
logical concatenation of multiple iterators in the provided sequence.
The returned iterator iterates items in the first iterator, and
then iterates items in the second iterator, etc.
// - a reader succeeds or
// - a reader returns an error that is not ErrNotFound
// If all readers return ErrNotFound, Reader.Get will return ErrNotFound.
func NewMultiReader(readers ...storage.Reader) storage.Reader {
Copy link
Member

@zhangchiqing zhangchiqing Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the chained storage modules to handle the case, it's just handling at one layer above (the store layer). I wonder what would be the different usage case here that we want to add the multi reader at a lower layer (the operations layer).

Copy link
Member Author

@fxamacker fxamacker Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangchiqing Before opening this PR, I tried using chained storage modules first, but it had some limitations. So I implemented multiReader and multiIterator at a operations level (lower layer).

Access nodes need both read and write operations for transactions and collections.

If we implement multi reader at store layer (higher layer) as in chained storage modules, we need to add multi reader for every store, such as transactions store and collections store. We also need to add TransactionsReader and CollectionsReader fields to FlowAccessNodeBuilder. More importantly, we need to make sure to use multi reader (instead of regular store like Transactions) to query data.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader (see PR #7321). It leads to less duplicate code, easier testing, and easier integration, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see your points. The changes from PR 7321 does make much less change.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader

Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.

Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.

Yes, we will need to add TransactionsReader and CollectionsReader fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.

I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.

@fxamacker fxamacker requested a review from a team April 17, 2025 22:11
// - a reader succeeds or
// - a reader returns an error that is not ErrNotFound
// If all readers return ErrNotFound, Reader.Get will return ErrNotFound.
func NewMultiReader(readers ...storage.Reader) storage.Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see your points. The changes from PR 7321 does make much less change.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader

Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.

Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.

Yes, we will need to add TransactionsReader and CollectionsReader fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.

I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.

@fxamacker fxamacker requested review from a team and peterargue April 18, 2025 16:59
@fxamacker fxamacker requested a review from a team April 21, 2025 14:03
@fxamacker fxamacker changed the title [Storage] Add database multiReader and multiIterator (BadgerDB, Pebble) [Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) Apr 21, 2025
@fxamacker
Copy link
Member Author

I added multiSeeker for consistency, etc. since this PR was already adding multiReader and multiIterator. PTAL 🙏

@fxamacker
Copy link
Member Author

fxamacker commented Apr 22, 2025

@peterargue PTAL 🙏

For context, @zhangchiqing and I had meeting about this PR last Friday (April 18), mostly regarding #7320 (comment).

Leo and I agree with the approach taken by this PR to provide the functionality at lower level. Some extra benefits discovered during our meeting include this PR only needed 1 cache compared to alternative approach.

Also, this approach enables multiReader, multiIterator, and multiSeeker, while alternative approach only supports chained reader.

Given this involves Access Node, Leo and I would like to see if the approach taken by this PR works for you too before proceeding.

Input validity checking happens when new iterators are created
so the redundant check isn't needed.
This commit makes NewMultiReader return error
early if caller specifies no readers.
Copy link
Member

@zhangchiqing zhangchiqing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Faye.

@fxamacker fxamacker requested a review from a team April 23, 2025 22:06
Copy link
Contributor

@peterargue peterargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. thanks @fxamacker

@fxamacker fxamacker added this pull request to the merge queue Apr 25, 2025
Merged via the queue into master with commit bb094ec Apr 25, 2025
57 checks passed
@fxamacker fxamacker deleted the fxamacker/add-multi-reader-and-multi-iterator branch April 25, 2025 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants