[Access] Refactor storage collections for access node #7093

zhangchiqing · 2025-02-25T18:39:32Z

Working towards #6515

Review #7059 first.

This PR refactors the transactions and collection storage in access node to use the generic storage module.

codecov-commenter · 2025-03-05T19:46:04Z

Codecov Report

Attention: Patch coverage is 9.83607% with 385 lines in your changes missing coverage. Please review.

Project coverage is 41.28%. Comparing base (9cf7dcb) to head (d343b5c).

Files with missing lines	Patch %	Lines
storage/mock/events_reader.go	0.00%	94 Missing ⚠️
storage/mock/transaction_results_reader.go	0.00%	72 Missing ⚠️
storage/mock/execution_results_reader.go	0.00%	63 Missing ⚠️
consensus/hotstuff/mocks/persister_reader.go	0.00%	50 Missing ⚠️
cmd/access/node_builder/access_node_builder.go	0.00%	47 Missing ⚠️
storage/mock/commits_reader.go	0.00%	28 Missing ⚠️
storage/store/collections.go	46.15%	17 Missing and 4 partials ⚠️
storage/store/cache.go	68.75%	4 Missing and 1 partial ⚠️
storage/operation/collections.go	20.00%	4 Missing ⚠️
cmd/observer/node_builder/observer_builder.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7093      +/-   ##
==========================================
- Coverage   41.34%   41.28%   -0.06%     
==========================================
  Files        2180     2185       +5     
  Lines      190829   191164     +335     
==========================================
+ Hits        78893    78927      +34     
- Misses     105342   105639     +297     
- Partials     6594     6598       +4

Flag	Coverage Δ
unittests	`41.28% <9.83%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

peterargue · 2025-03-07T19:18:01Z

cmd/execution_builder.go

@@ -218,6 +217,9 @@ func (builder *ExecutionNodeBuilder) LoadComponentsAndModules() {
 		Module("blobservice peer manager dependencies", exeNode.LoadBlobservicePeerManagerDependencies).
 		Module("bootstrap", exeNode.LoadBootstrapper).
 		Module("register store", exeNode.LoadRegisterStore).
+		AdminCommand("get-transactions", func(conf *NodeConfig) commands.AdminCommand {


why move this?

Because the exeNode.collections was not initialized until exeNode.LoadCollections is called.

That said, this change should be in a different PR, let me check.

peterargue · 2025-03-07T19:22:23Z

storage/operation/collections.go

+// IndexCollectionPayload indexes the transactions within the collection payload
+// of a cluster block.


is this specific to collection cluster logic, or is this just indexing by blockID?

I'm wondering if we're overloading the codeIndexCollection to mean different things on ANs/ENs vs LNs

peterargue · 2025-03-07T19:25:46Z

storage/operation/collections_test.go

+		t.Run("Retrieve nonexistant", func(t *testing.T) {
+			var actual flow.LightCollection
+			err := operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
+			assert.Error(t, err)


Suggested change

assert.Error(t, err)

assert.ErrorIs(t, err, storage.ErrNotFound)

assert.Nil(t, actual)

peterargue · 2025-03-07T19:27:04Z

storage/operation/collections_test.go

+
+			var actual flow.LightCollection
+			err = operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
+			assert.Error(t, err)


can you assert the specific error here and wherever we have sentinels returned

peterargue · 2025-03-07T19:27:55Z

storage/operation/collections_test.go

+
+			_ = db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
+				err := operation.InsertCollection(rw.Writer(), &expected)
+				assert.Nil(t, err)


I think assert.NoError() communicates your intent more clearly

Suggested change

assert.Nil(t, err)

assert.NoError(t, err)

storage/operation/transactions_test.go

peterargue · 2025-03-07T19:33:33Z

storage/store/collections.go

+}
+
+func NewCollections(db storage.DB, transactions *Transactions) *Collections {
+	c := &Collections{


what do you think about adding a cache? collections are commonly looked up on access nodes. totally fine to do later

Yeah, maybe add later.

storage/store/collections.go

+
+func (c *Collections) Remove(colID flow.Identifier) error {
+	err := c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
+		return operation.RemoveCollection(rw.Writer(), colID)


peterargue · 2025-03-07T19:39:18Z

storage/store/collections.go

+				// transaction is already indexed by a different collection, we should not index it again
+				// so that the access node will always return the same collection for a given transaction
+				// and return a consistent transaction result status.
+				continue


I think we should return an error here since LNs are supposed to prevent a tx from

appearing multiple times in the same collection

appearing in multiple collections

cmd/access/node_builder/access_node_builder.go

jordanschalm · 2025-03-12T17:50:15Z

storage/operation/collections.go

+// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id
+// any error returned are exceptions
+func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {
+	return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
+}


Suggested change

// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id

// any error returned are exceptions

func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {

return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))

}

// RemoveCollectionByTransactionIndex removes a collection id indexed by a transaction id,

// created by [UnsafeIndexCollectionByTransaction].

// Any error returned is an exception.

func RemoveCollectionByTransactionIndex(w storage.Writer, txID flow.Identifier) error {

return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))

}

Naming to match the insert method for same index.

jordanschalm · 2025-03-12T17:55:10Z

storage/store/collections.go

+	if err != nil {
+		return nil, err
+	}
+


Suggested change

if err != nil {

return nil, err

}

The error is already checked above

jordanschalm · 2025-03-12T18:01:57Z

storage/operation/collections.go

@@ -52,3 +50,15 @@ func UnsafeIndexCollectionByTransaction(w storage.Writer, txID flow.Identifier,
 func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {


Suggested change

func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

// LookupCollectionByTransaction looks up the collection indexed by the given transaction ID,

// which is the collection in which the given transaction was included.

// No errors are expected during normal operaion.

func LookupCollectionByTransaction(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

To match naming of other methods operating on the same index.

storage/store/collections.go

+	err = c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
+		// remove transaction indices
+		for _, txID := range col.Transactions {
+			err = operation.RemoveCollectionTransactionIndices(rw.Writer(), txID)


storage/operation/collections.go

jordanschalm · 2025-03-12T18:16:20Z

storage/store/collections.go

-				}
-				continue
+			// the indexingByTx lock has ensured we are the only process indexing collection by transaction
+			err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())


Suggested change

err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())

err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, cid)

Avoid re-computing the hash every loop iteration

jordanschalm · 2025-03-12T18:27:52Z

storage/store/collections.go

+		if err == nil {
+			// collection nodes have ensured that a transaction can only belong to one collection
+			// so if transaction is already indexed by a collection, check if it's the same collection.
+			// if not, return an error
+			if cid != differentColTxIsIn {
+				return fmt.Errorf("transaction %v is already indexed by a different collection %v", txID, differentColTxIsIn)
+			}


I think this is substantially changing the behaviour.

Previously, we would skip re-indexing TXID->COLLECTIONID, if any index entry for TXID already existed. Now we are throwing an exception.

The reason we specifically check for the case of the index already existing is to make sure that we don't overwrite the index with a different collection ID, so that the information served by the Access API is consistent (if not correct). Now this scenario will cause an exception and likely the node will enter a crash-loop. To match the previous behaviour, the case of err == nil on line 151 should be a no-op.

It is true that we don't currently expect this scenario to happen, absent a cluster consensus bug, but we have had such bugs in the past, and in the mature system we need to tolerate Byzantine clusters. So I don't think this should throw an exception.

zhangchiqing · 2025-03-13T19:09:34Z

cmd/access/node_builder/access_node_builder.go

@@ -575,6 +577,15 @@ func (builder *FlowAccessNodeBuilder) BuildExecutionSyncComponents() *FlowAccess
 		AdminCommand("read-execution-data", func(config *cmd.NodeConfig) commands.AdminCommand {
 			return stateSyncCommands.NewReadExecutionDataCommand(builder.ExecutionDataStore)
 		}).
+		Module("transactions and collections storage", func(node *cmd.NodeConfig) error {
+			// TODO: needs to be wrapped with ChainedCollections module, otherwise once we switch


Link the issue as TODO here #6523 (comment) .

Will be addressed separately. We can review and approve this PR, but not merge until the TODO is completed.

cc @fxamacker

jordanschalm · 2025-04-09T22:10:21Z

storage/operation/transactions.go

@@ -15,3 +15,8 @@ func UpsertTransaction(w storage.Writer, txID flow.Identifier, tx *flow.Transact
 func RetrieveTransaction(r storage.Reader, txID flow.Identifier, tx *flow.TransactionBody) error {
 	return RetrieveByKey(r, MakePrefix(codeTransaction, txID), tx)
 }
+
+// RemoveTransaction removes a transaction by fingerprint.


Suggested change

// RemoveTransaction removes a transaction by fingerprint.

// RemoveTransaction removes a transaction by ID.

jordanschalm · 2025-04-09T22:15:00Z

storage/store/transactions.go

+// RemoveBatch removes a transaction by fingerprint.
+func (t *Transactions) RemoveBatch(rw storage.ReaderBatchWriter, txID flow.Identifier) error {


Suggested change

// RemoveBatch removes a transaction by fingerprint.

func (t *Transactions) RemoveBatch(rw storage.ReaderBatchWriter, txID flow.Identifier) error {

// Remove removes a transaction by ID.

func (t *Transactions) Remove(rw storage.ReaderBatchWriter, txID flow.Identifier) error {

It's just removing one transaction, not a batch, right? Or is the idea that we name everything accepting a ReaderBatchWriter as *Batch?

Yeah, *Batch means it's part of a batch update.

Maybe I remove it into BatchRemove just like BatchStore?

jordanschalm · 2025-04-09T22:55:11Z

storage/store/collections.go

@@ -98,11 +89,37 @@ func (c *Collections) LightByID(colID flow.Identifier) (*flow.LightCollection, e

 // Remove removes a collection from the database.


Suggested change

// Remove removes a collection from the database.

// Remove removes a collection from the database, including all constituent transactions and indices inserted by Store.

jordanschalm · 2025-04-09T22:59:21Z

storage/store/collections.go

-					return fmt.Errorf("could not insert transaction ID: %w", err)
+				// collection nodes have ensured that a transaction can only belong to one collection
+				// so if transaction is already indexed by a collection, check if it's the same collection.
+				// if not, return an error


Suggested change

// if not, return an error

// TODO: For now we log a warning, but eventually we need to handle Byzantine clusters

// producing invalid collections, including collections duplicating transactions.

jordanschalm · 2025-04-09T23:00:25Z

storage/store/collections.go

+				// so if transaction is already indexed by a collection, check if it's the same collection.
+				// if not, return an error
+				if collectionID != differentColTxIsIn {
+					log.Error().Msgf("fatal: transaction %v in collection %v is already indexed by a different collection %v",


Suggested change

log.Error().Msgf("fatal: transaction %v in collection %v is already indexed by a different collection %v",

log.Error().Msgf("sanity check failed: transaction %v in collection %v is already indexed by a different collection %v",

It's not really fatal if we happily continue after logging the error message 😅

The reason I used fatal is so that it's easy to filter from logs. but I could also remember and query with sanity

jordanschalm · 2025-04-09T23:02:26Z

storage/store/collections.go

+			if err != nil {
+				return fmt.Errorf("could not insert transaction ID: %w", err)
+			}
+			continue


Suggested change

continue

This seems redundant, since we're at the end of the loop block here anyway.

cmd/access/node_builder/access_node_builder.go

storage/operation/collections.go

peterargue · 2025-04-10T13:59:58Z

storage/store/collections.go

@@ -98,11 +89,37 @@ func (c *Collections) LightByID(colID flow.Identifier) (*flow.LightCollection, e

 // Remove removes a collection from the database.
 // Remove does not error if the collection does not exist
+// Note: this method should only be called for collections included in blocks below sealed height
 // any error returned are exceptions
 func (c *Collections) Remove(colID flow.Identifier) error {


does this need to take the indexingByTx lock since it's modifying the index table?

if not, please add a comment explaining why.

fxamacker

Looks good! I only left one comment about needing to remove transaction in memory cache when it is removed from the underlying database store.

I think there are other stores (not just this PR) with memory cache that can contain records no longer in the underlying database.

For more info, see issue #7313.

storage/store/transactions.go

fxamacker

Nice! Thanks for adding and using Cache.RemoveTx 👍

zhangchiqing added 7 commits February 18, 2025 20:15

refactor collections

eca55cd

update execution builder to use store.collections

fde54c9

refactor collection node to read collections

a109af6

revert collection changes

f326355

adding lock to protect indexing collections

435b991

add concurrent test

6cd812c

refactor access node's transactions and collections storage module

208fb81

zhangchiqing mentioned this pull request Feb 25, 2025

Replacing Badger with Pebble DB #6515

Open

16 tasks

Merge branch 'master' into leo/refactor-storage-collections-for-an

a5f43fb

zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from d72e501 to a5f43fb Compare March 6, 2025 17:09

zhangchiqing added 2 commits March 6, 2025 20:52

refactor using protocoldb

705399d

making collections and transactions private

b70d475

peterargue reviewed Mar 7, 2025

View reviewed changes

zhangchiqing added 2 commits March 9, 2025 17:45

Merge branch 'master' into leo/refactor-storage-collections-for-an

5013f21

fix lint

3287ed1

j1010001 mentioned this pull request Mar 10, 2025

Badger -> Pebble DB M2 - DB access refactoring for low-risk data (AN, EN, VN) #6527

Closed

17 tasks

zhangchiqing added 3 commits March 10, 2025 16:56

update collections.Remove

07288a9

remove collections.StoreLight

f38988b

index collections by txs

1a15b10

jordanschalm reviewed Mar 12, 2025

View reviewed changes

zhangchiqing added 5 commits March 13, 2025 09:33

Merge branch 'master' into leo/refactor-storage-collections-for-an

e4bf8e5

update collections operations methods

99bbf77

refactor collection store and remove methods

14469d0

refactor StoreLightAndIndexByTransaction

db502ad

remove unused collections operations

35692d7

zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from 8bdd882 to 35692d7 Compare March 13, 2025 18:06

zhangchiqing added 3 commits March 13, 2025 11:15

update mocks

127859b

add UpertCollection methods

cbb01aa

refactor StoreLightAndIndexByTransaction

311b44b

add TODO

a9c3140

zhangchiqing commented Mar 13, 2025

View reviewed changes

zhangchiqing added 5 commits April 1, 2025 14:38

Merge branch 'master' into leo/refactor-storage-collections-for-an

6ab6b8c

use notNil for modules initialized by access builder

7053720

Merge branch 'master' into leo/refactor-storage-collections-for-an

2267d64

collection rpc might be nil

e1455ee

fix builder

a77dede

zhangchiqing marked this pull request as ready for review April 4, 2025 23:53

zhangchiqing requested a review from a team as a code owner April 4, 2025 23:53

zhangchiqing requested review from jordanschalm and peterargue April 7, 2025 16:58

jordanschalm approved these changes Apr 9, 2025

View reviewed changes

peterargue reviewed Apr 10, 2025

View reviewed changes

zhangchiqing requested a review from fxamacker April 10, 2025 17:37

zhangchiqing added 2 commits April 10, 2025 10:43

update comments for store collections/transactions

509efe7

fix access node builder

dab7142

fxamacker reviewed Apr 15, 2025

View reviewed changes

storage/store/transactions.go Outdated Show resolved Hide resolved

zhangchiqing added 5 commits April 16, 2025 10:37

add TestTransactionRemove tests

2bd20e8

add withRemove to store/cache

80a2eaf

Merge branch 'master' into leo/refactor-storage-collections-for-an

57e4c8e

remove collections.StoreLight

d343b5c

add mocks

78dec17

fxamacker approved these changes Apr 17, 2025

View reviewed changes

zhangchiqing added this pull request to the merge queue Apr 17, 2025

Merged via the queue into master with commit 17ac40b Apr 17, 2025
56 checks passed

zhangchiqing deleted the leo/refactor-storage-collections-for-an branch April 17, 2025 18:44

		// IndexCollectionPayload indexes the transactions within the collection payload
		// of a cluster block.

	assert.Error(t, err)
	assert.ErrorIs(t, err, storage.ErrNotFound)
	assert.Nil(t, actual)

		@@ -52,3 +50,15 @@ func UnsafeIndexCollectionByTransaction(w storage.Writer, txID flow.Identifier,
		func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

-func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {
+// LookupCollectionByTransaction looks up the collection indexed by the given transaction ID,
+// which is the collection in which the given transaction was included.
+// No errors are expected during normal operaion.
+func LookupCollectionByTransaction(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

	err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())
	err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, cid)

	// RemoveTransaction removes a transaction by fingerprint.
	// RemoveTransaction removes a transaction by ID.

		// RemoveBatch removes a transaction by fingerprint.
		func (t *Transactions) RemoveBatch(rw storage.ReaderBatchWriter, txID flow.Identifier) error {

		@@ -98,11 +89,37 @@ func (c Collections) LightByID(colID flow.Identifier) (flow.LightCollection, e

		// Remove removes a collection from the database.

	// if not, return an error
	// TODO: For now we log a warning, but eventually we need to handle Byzantine clusters
	// producing invalid collections, including collections duplicating transactions.

	log.Error().Msgf("fatal: transaction %v in collection %v is already indexed by a different collection %v",
	log.Error().Msgf("sanity check failed: transaction %v in collection %v is already indexed by a different collection %v",

[Access] Refactor storage collections for access node #7093

[Access] Refactor storage collections for access node #7093

Uh oh!

Conversation

zhangchiqing commented Feb 25, 2025

Uh oh!

codecov-commenter commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fxamacker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fxamacker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Mar 5, 2025 •

edited

Loading