[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

fxamacker · 2025-04-17T20:21:41Z

Updates issue #6523, #6527, required by PR #7321

Currently, implementations of storage Reader, Iterator, and Seeker read data from one database (BadgerDB or Pebble).

However, we need to access data from multiple databases after switching storage from BadgerDB to Pebble if there is no data migration. Without migration, data can be in either BadgerDB or Pebble so it would be useful to try both databases.

This PR adds new implementations of Reader, Iterator, and Seeker to support querying and iterating data across databases:

multiReader
multiIterator
multiSeeker

The NewMultiReader function returns a multiReader that consists of multiple readers in the provided order. Readers are read sequentially until:

a reader succeeds or
a reader returns an error that is not ErrNotFound
If all readers return ErrNotFound, Reader.Get will return ErrNotFound.

The NewMultiIterator function returns a multiIterator that is a logical concatenation of multiple iterators in the provided sequence. The returned iterator iterates items in the first iterator, and then iterates items in the second iterator, etc.

The NewMultiSeeker function returns a multiSeeker that is a logical concatenation of multiple seekers in the provided sequence. The returned seeker seeks largest key in lexicographical order in the last seeker, and then seeks items in the second seeker, etc.

codecov-commenter · 2025-04-17T20:25:34Z

Codecov Report

Attention: Patch coverage is 63.11475% with 45 lines in your changes missing coverage. Please review.

Project coverage is 41.27%. Comparing base (252128f) to head (0a65506).

Files with missing lines	Patch %	Lines
storage/operation/multi_reader.go	66.66%	8 Missing and 4 partials ⚠️
storage/mock/reader.go	0.00%	11 Missing ⚠️
storage/operation/multi_iterator.go	81.25%	6 Missing and 3 partials ⚠️
storage/operation/multi_seeker.go	68.42%	4 Missing and 2 partials ⚠️
storage/operation/reads.go	25.00%	2 Missing and 1 partial ⚠️
storage/operation/badgerimpl/reader.go	0.00%	2 Missing ⚠️
storage/operation/pebbleimpl/reader.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7320      +/-   ##
==========================================
+ Coverage   41.25%   41.27%   +0.01%     
==========================================
  Files        2193     2196       +3     
  Lines      192006   192119     +113     
==========================================
+ Hits        79210    79293      +83     
- Misses     106189   106212      +23     
- Partials     6607     6614       +7

Flag	Coverage Δ
unittests	`41.27% <63.11%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Currently, implementations of storage Reader and Iterator read data from one database (BadgerDB or Pebble). However, we need to access data from multiple databases after switching storage from BadgerDB to Pebble (without data migration). This commit adds new implementations of Reader and Iterator to support querying and iterating data across databases: - multiReader - multiIterator The NewMultiReader function returns a multiReader that consists of multiple readers in the provided order. Readers are read sequentially until: - a reader succeeds or - a reader returns an error that is not ErrNotFound If all readers return ErrNotFound, Reader.Get will return ErrNotFound. The NewMultiIterator function returns a multiIterator that is a logical concatenation of multiple iterators in the provided sequence. The returned iterator iterates items in the first iterator, and then iterates items in the second iterator, etc.

zhangchiqing · 2025-04-17T20:49:51Z

storage/operation/multi_reader.go

+// - a reader succeeds or
+// - a reader returns an error that is not ErrNotFound
+// If all readers return ErrNotFound, Reader.Get will return ErrNotFound.
+func NewMultiReader(readers ...storage.Reader) storage.Reader {


We have the chained storage modules to handle the case, it's just handling at one layer above (the store layer). I wonder what would be the different usage case here that we want to add the multi reader at a lower layer (the operations layer).

@zhangchiqing Before opening this PR, I tried using chained storage modules first, but it had some limitations. So I implemented multiReader and multiIterator at a operations level (lower layer).

Access nodes need both read and write operations for transactions and collections.

If we implement multi reader at store layer (higher layer) as in chained storage modules, we need to add multi reader for every store, such as transactions store and collections store. We also need to add TransactionsReader and CollectionsReader fields to FlowAccessNodeBuilder. More importantly, we need to make sure to use multi reader (instead of regular store like Transactions) to query data.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader (see PR #7321). It leads to less duplicate code, easier testing, and easier integration, etc.

Yeah, I can see your points. The changes from PR 7321 does make much less change.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader

Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.

Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.

Yes, we will need to add TransactionsReader and CollectionsReader fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.

I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.

storage/operation/multi_reader.go

zhangchiqing · 2025-04-18T16:14:10Z

storage/operation/multi_reader.go

+// - a reader succeeds or
+// - a reader returns an error that is not ErrNotFound
+// If all readers return ErrNotFound, Reader.Get will return ErrNotFound.
+func NewMultiReader(readers ...storage.Reader) storage.Reader {


Yeah, I can see your points. The changes from PR 7321 does make much less change.

On the other hand, if we implement multi reader at operation level (lower level), every store can support it by adding additional reader

Yes, you are right, although it comes with performance penalty, since for queries on non-existing data, we would have to query the additional store.

Not all modules need to use chained storage, only the RPC server who serves the RPC queries need it.

Yes, we will need to add TransactionsReader and CollectionsReader fields, but I think that would be good additions, since they make it clear that those modules who depend on the readers do not make any writes.

I could be wrong, maybe let's discuss with @peterargue , and decide which approach is more suitable here.

storage/operation/multi_reader.go

storage/operation/multi_iterator.go

fxamacker · 2025-04-21T17:23:28Z

I added multiSeeker for consistency, etc. since this PR was already adding multiReader and multiIterator. PTAL 🙏

fxamacker · 2025-04-22T17:57:53Z

@peterargue PTAL 🙏

For context, @zhangchiqing and I had meeting about this PR last Friday (April 18), mostly regarding #7320 (comment).

Leo and I agree with the approach taken by this PR to provide the functionality at lower level. Some extra benefits discovered during our meeting include this PR only needed 1 cache compared to alternative approach.

Also, this approach enables multiReader, multiIterator, and multiSeeker, while alternative approach only supports chained reader.

Given this involves Access Node, Leo and I would like to see if the approach taken by this PR works for you too before proceeding.

Input validity checking happens when new iterators are created so the redundant check isn't needed.

This commit makes NewMultiReader return error early if caller specifies no readers.

zhangchiqing

Thanks Faye.

peterargue

looks great. thanks @fxamacker

fxamacker added the Improvement label Apr 17, 2025

fxamacker requested review from zhangchiqing and a team April 17, 2025 20:21

fxamacker self-assigned this Apr 17, 2025

fxamacker requested a review from a team as a code owner April 17, 2025 20:21

fxamacker force-pushed the fxamacker/add-multi-reader-and-multi-iterator branch from 46c809c to 5fdcb99 Compare April 17, 2025 20:26

fxamacker mentioned this pull request Apr 17, 2025

[Access] Add support for multi-store reading of collections & transactions in Access Nodes #7321

Merged

zhangchiqing reviewed Apr 17, 2025

View reviewed changes

fxamacker requested a review from a team April 17, 2025 22:11

zhangchiqing reviewed Apr 18, 2025

View reviewed changes

fxamacker requested review from a team and peterargue April 18, 2025 16:59

zhangchiqing reviewed Apr 18, 2025

View reviewed changes

storage/operation/multi_reader.go Outdated Show resolved Hide resolved

storage/operation/multi_iterator.go Show resolved Hide resolved

fxamacker requested a review from a team April 21, 2025 14:03

fxamacker added 3 commits April 21, 2025 09:52

Merge branch 'master' into fxamacker/add-multi-reader-and-multi-iterator

b1240fe

Update multiReader to implement storage.Reader

6f8d125

Add storage multiSeeker

8f1f1cd

fxamacker changed the title ~~[Storage] Add database multiReader and multiIterator (BadgerDB, Pebble)~~ [Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) Apr 21, 2025

fxamacker added 4 commits April 22, 2025 13:24

Remove input checking in multiReader.NewIter()

1d6b275

Input validity checking happens when new iterators are created so the redundant check isn't needed.

Update NewMultiReader to return error on 0 reader

8a5eee2

This commit makes NewMultiReader return error early if caller specifies no readers.

Merge branch 'master' into fxamacker/add-multi-reader-and-multi-iterator

fa407f6

Update mock

e7f413b

zhangchiqing approved these changes Apr 23, 2025

View reviewed changes

Merge branch 'master' into fxamacker/add-multi-reader-and-multi-iterator

10e731c

fxamacker requested a review from a team April 23, 2025 22:06

Merge branch 'master' into fxamacker/add-multi-reader-and-multi-iterator

0a65506

peterargue approved these changes Apr 25, 2025

View reviewed changes

fxamacker added this pull request to the merge queue Apr 25, 2025

Merged via the queue into master with commit bb094ec Apr 25, 2025
57 checks passed

fxamacker deleted the fxamacker/add-multi-reader-and-multi-iterator branch April 25, 2025 23:25

fxamacker mentioned this pull request Apr 29, 2025

Database Operation Transition: From Badger to Pebble #6523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

Uh oh!

fxamacker commented Apr 17, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 17, 2025 •

edited

Loading

Uh oh!

zhangchiqing Apr 17, 2025 •

edited

Loading

Uh oh!

fxamacker Apr 17, 2025 •

edited

Loading

Uh oh!

zhangchiqing Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

zhangchiqing Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

fxamacker commented Apr 21, 2025

Uh oh!

fxamacker commented Apr 22, 2025 •

edited

Loading

Uh oh!

zhangchiqing left a comment

Uh oh!

peterargue left a comment

Uh oh!

Uh oh!

Uh oh!

[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

[Storage] Add database multiReader, multiIterator, multiSeeker (BadgerDB, Pebble) #7320

Uh oh!

Conversation

fxamacker commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zhangchiqing Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fxamacker Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangchiqing Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhangchiqing Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fxamacker commented Apr 21, 2025

Uh oh!

fxamacker commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangchiqing left a comment

Choose a reason for hiding this comment

Uh oh!

peterargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fxamacker commented Apr 17, 2025 •

edited

Loading

codecov-commenter commented Apr 17, 2025 •

edited

Loading

zhangchiqing Apr 17, 2025 •

edited

Loading

fxamacker Apr 17, 2025 •

edited

Loading

fxamacker commented Apr 22, 2025 •

edited

Loading