Skip to content

[Access] Refactor storage collections for access node #7093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Apr 17, 2025

Conversation

zhangchiqing
Copy link
Member

Working towards #6515

Review #7059 first.

This PR refactors the transactions and collection storage in access node to use the generic storage module.

@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2025

Codecov Report

Attention: Patch coverage is 9.83607% with 385 lines in your changes missing coverage. Please review.

Project coverage is 41.28%. Comparing base (9cf7dcb) to head (d343b5c).

Files with missing lines Patch % Lines
storage/mock/events_reader.go 0.00% 94 Missing ⚠️
storage/mock/transaction_results_reader.go 0.00% 72 Missing ⚠️
storage/mock/execution_results_reader.go 0.00% 63 Missing ⚠️
consensus/hotstuff/mocks/persister_reader.go 0.00% 50 Missing ⚠️
cmd/access/node_builder/access_node_builder.go 0.00% 47 Missing ⚠️
storage/mock/commits_reader.go 0.00% 28 Missing ⚠️
storage/store/collections.go 46.15% 17 Missing and 4 partials ⚠️
storage/store/cache.go 68.75% 4 Missing and 1 partial ⚠️
storage/operation/collections.go 20.00% 4 Missing ⚠️
cmd/observer/node_builder/observer_builder.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7093      +/-   ##
==========================================
- Coverage   41.34%   41.28%   -0.06%     
==========================================
  Files        2180     2185       +5     
  Lines      190829   191164     +335     
==========================================
+ Hits        78893    78927      +34     
- Misses     105342   105639     +297     
- Partials     6594     6598       +4     
Flag Coverage Δ
unittests 41.28% <9.83%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zhangchiqing zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from d72e501 to a5f43fb Compare March 6, 2025 17:09
@@ -218,6 +217,9 @@ func (builder *ExecutionNodeBuilder) LoadComponentsAndModules() {
Module("blobservice peer manager dependencies", exeNode.LoadBlobservicePeerManagerDependencies).
Module("bootstrap", exeNode.LoadBootstrapper).
Module("register store", exeNode.LoadRegisterStore).
AdminCommand("get-transactions", func(conf *NodeConfig) commands.AdminCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the exeNode.collections was not initialized until exeNode.LoadCollections is called.

That said, this change should be in a different PR, let me check.

Comment on lines 24 to 25
// IndexCollectionPayload indexes the transactions within the collection payload
// of a cluster block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this specific to collection cluster logic, or is this just indexing by blockID?

I'm wondering if we're overloading the codeIndexCollection to mean different things on ANs/ENs vs LNs

t.Run("Retrieve nonexistant", func(t *testing.T) {
var actual flow.LightCollection
err := operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
assert.Error(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Error(t, err)
assert.ErrorIs(t, err, storage.ErrNotFound)
assert.Nil(t, actual)


var actual flow.LightCollection
err = operation.RetrieveCollection(db.Reader(), expected.ID(), &actual)
assert.Error(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert the specific error here and wherever we have sentinels returned


_ = db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
err := operation.InsertCollection(rw.Writer(), &expected)
assert.Nil(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert.NoError() communicates your intent more clearly

Suggested change
assert.Nil(t, err)
assert.NoError(t, err)

}

func NewCollections(db storage.DB, transactions *Transactions) *Collections {
c := &Collections{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about adding a cache? collections are commonly looked up on access nodes. totally fine to do later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe add later.


func (c *Collections) Remove(colID flow.Identifier) error {
err := c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
return operation.RemoveCollection(rw.Writer(), colID)

This comment was marked as resolved.

// transaction is already indexed by a different collection, we should not index it again
// so that the access node will always return the same collection for a given transaction
// and return a consistent transaction result status.
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return an error here since LNs are supposed to prevent a tx from

  1. appearing multiple times in the same collection
  2. appearing in multiple collections

Comment on lines 60 to 64
// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id
// any error returned are exceptions
func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RemoveCollectionTransactionIndices removes a collection id indexed by a transaction id
// any error returned are exceptions
func RemoveCollectionTransactionIndices(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}
// RemoveCollectionByTransactionIndex removes a collection id indexed by a transaction id,
// created by [UnsafeIndexCollectionByTransaction].
// Any error returned is an exception.
func RemoveCollectionByTransactionIndex(w storage.Writer, txID flow.Identifier) error {
return RemoveByKey(w, MakePrefix(codeIndexCollectionByTransaction, txID))
}

Naming to match the insert method for same index.

Comment on lines 85 to 88
if err != nil {
return nil, err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
return nil, err
}

The error is already checked above

@@ -52,3 +50,15 @@ func UnsafeIndexCollectionByTransaction(w storage.Writer, txID flow.Identifier,
func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func RetrieveCollectionID(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {
// LookupCollectionByTransaction looks up the collection indexed by the given transaction ID,
// which is the collection in which the given transaction was included.
// No errors are expected during normal operaion.
func LookupCollectionByTransaction(r storage.Reader, txID flow.Identifier, collectionID *flow.Identifier) error {

To match naming of other methods operating on the same index.

err = c.db.WithReaderBatchWriter(func(rw storage.ReaderBatchWriter) error {
// remove transaction indices
for _, txID := range col.Transactions {
err = operation.RemoveCollectionTransactionIndices(rw.Writer(), txID)

This comment was marked as resolved.

}
continue
// the indexingByTx lock has ensured we are the only process indexing collection by transaction
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, collection.ID())
err = operation.UnsafeIndexCollectionByTransaction(rw.Writer(), txID, cid)

Avoid re-computing the hash every loop iteration

Comment on lines 151 to 157
if err == nil {
// collection nodes have ensured that a transaction can only belong to one collection
// so if transaction is already indexed by a collection, check if it's the same collection.
// if not, return an error
if cid != differentColTxIsIn {
return fmt.Errorf("transaction %v is already indexed by a different collection %v", txID, differentColTxIsIn)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is substantially changing the behaviour.

Previously, we would skip re-indexing TXID->COLLECTIONID, if any index entry for TXID already existed. Now we are throwing an exception.

The reason we specifically check for the case of the index already existing is to make sure that we don't overwrite the index with a different collection ID, so that the information served by the Access API is consistent (if not correct). Now this scenario will cause an exception and likely the node will enter a crash-loop. To match the previous behaviour, the case of err == nil on line 151 should be a no-op.

It is true that we don't currently expect this scenario to happen, absent a cluster consensus bug, but we have had such bugs in the past, and in the mature system we need to tolerate Byzantine clusters. So I don't think this should throw an exception.

@zhangchiqing zhangchiqing force-pushed the leo/refactor-storage-collections-for-an branch from 8bdd882 to 35692d7 Compare March 13, 2025 18:06
@@ -575,6 +577,15 @@ func (builder *FlowAccessNodeBuilder) BuildExecutionSyncComponents() *FlowAccess
AdminCommand("read-execution-data", func(config *cmd.NodeConfig) commands.AdminCommand {
return stateSyncCommands.NewReadExecutionDataCommand(builder.ExecutionDataStore)
}).
Module("transactions and collections storage", func(node *cmd.NodeConfig) error {
// TODO: needs to be wrapped with ChainedCollections module, otherwise once we switch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link the issue as TODO here #6523 (comment) .

Will be addressed separately. We can review and approve this PR, but not merge until the TODO is completed.

cc @fxamacker

@zhangchiqing zhangchiqing marked this pull request as ready for review April 4, 2025 23:53
@zhangchiqing zhangchiqing requested a review from a team as a code owner April 4, 2025 23:53
@@ -15,3 +15,8 @@ func UpsertTransaction(w storage.Writer, txID flow.Identifier, tx *flow.Transact
func RetrieveTransaction(r storage.Reader, txID flow.Identifier, tx *flow.TransactionBody) error {
return RetrieveByKey(r, MakePrefix(codeTransaction, txID), tx)
}

// RemoveTransaction removes a transaction by fingerprint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RemoveTransaction removes a transaction by fingerprint.
// RemoveTransaction removes a transaction by ID.

Comment on lines +56 to +57
// RemoveBatch removes a transaction by fingerprint.
func (t *Transactions) RemoveBatch(rw storage.ReaderBatchWriter, txID flow.Identifier) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RemoveBatch removes a transaction by fingerprint.
func (t *Transactions) RemoveBatch(rw storage.ReaderBatchWriter, txID flow.Identifier) error {
// Remove removes a transaction by ID.
func (t *Transactions) Remove(rw storage.ReaderBatchWriter, txID flow.Identifier) error {

It's just removing one transaction, not a batch, right? Or is the idea that we name everything accepting a ReaderBatchWriter as *Batch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, *Batch means it's part of a batch update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I remove it into BatchRemove just like BatchStore?

@@ -98,11 +89,37 @@ func (c *Collections) LightByID(colID flow.Identifier) (*flow.LightCollection, e

// Remove removes a collection from the database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Remove removes a collection from the database.
// Remove removes a collection from the database, including all constituent transactions and indices inserted by Store.

return fmt.Errorf("could not insert transaction ID: %w", err)
// collection nodes have ensured that a transaction can only belong to one collection
// so if transaction is already indexed by a collection, check if it's the same collection.
// if not, return an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if not, return an error
// TODO: For now we log a warning, but eventually we need to handle Byzantine clusters
// producing invalid collections, including collections duplicating transactions.

// so if transaction is already indexed by a collection, check if it's the same collection.
// if not, return an error
if collectionID != differentColTxIsIn {
log.Error().Msgf("fatal: transaction %v in collection %v is already indexed by a different collection %v",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Error().Msgf("fatal: transaction %v in collection %v is already indexed by a different collection %v",
log.Error().Msgf("sanity check failed: transaction %v in collection %v is already indexed by a different collection %v",

It's not really fatal if we happily continue after logging the error message 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used fatal is so that it's easy to filter from logs. but I could also remember and query with sanity

if err != nil {
return fmt.Errorf("could not insert transaction ID: %w", err)
}
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
continue

This seems redundant, since we're at the end of the loop block here anyway.

@@ -98,11 +89,37 @@ func (c *Collections) LightByID(colID flow.Identifier) (*flow.LightCollection, e

// Remove removes a collection from the database.
// Remove does not error if the collection does not exist
// Note: this method should only be called for collections included in blocks below sealed height
// any error returned are exceptions
func (c *Collections) Remove(colID flow.Identifier) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to take the indexingByTx lock since it's modifying the index table?

if not, please add a comment explaining why.

@zhangchiqing zhangchiqing requested a review from fxamacker April 10, 2025 17:37
Copy link
Member

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I only left one comment about needing to remove transaction in memory cache when it is removed from the underlying database store.

I think there are other stores (not just this PR) with memory cache that can contain records no longer in the underlying database.

For more info, see issue #7313.

Copy link
Member

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for adding and using Cache.RemoveTx 👍

@zhangchiqing zhangchiqing added this pull request to the merge queue Apr 17, 2025
Merged via the queue into master with commit 17ac40b Apr 17, 2025
56 checks passed
@zhangchiqing zhangchiqing deleted the leo/refactor-storage-collections-for-an branch April 17, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants