Skip to content

ColumnSyncer #7029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 37 commits into
base: column-syncer
Choose a base branch
from
Draft

ColumnSyncer #7029

wants to merge 37 commits into from

Conversation

agnxsh
Copy link
Contributor

@agnxsh agnxsh commented Mar 21, 2025

No description provided.

Copy link

github-actions bot commented Mar 21, 2025

Unit Test Results

       15 files  ±0    2 630 suites  ±0   1h 15m 49s ⏱️ -24s
  6 372 tests ±0    5 870 ✔️ ±0  502 💤 ±0  0 ±0 
44 281 runs  ±0  43 603 ✔️ ±0  678 💤 ±0  0 ±0 

Results for commit 4feb2bf. ± Comparison against base commit 1885b3f.

♻️ This comment has been updated with latest results.

@agnxsh agnxsh force-pushed the cs2 branch 4 times, most recently from 30298d7 to 42e71ee Compare March 31, 2025 10:58
@agnxsh agnxsh force-pushed the cs2 branch 2 times, most recently from 5bde140 to fd3c63f Compare April 20, 2025 04:43
# Validating received blocks one by one
var
hasInvalidBlock = false
unviableBlock: Option[(Eth2Digest, Slot)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, Opt is typically preferable to Option


if lenu64(data) > (req.count * NUMBER_OF_COLUMNS):
# Number of data columns in the response should be less than
# or equal to (MAX_BLOBS_PER_BLOCK_FULU * NUMBER_OF_COLUMNS).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_BLOBS_PER_BLOCK_FULU isn't a thing anymore.

## empty slots).
len(sr.data) == 0

proc hasEndGap*[T](sr: ColumnSyncResult[T]): bool {.inline.} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not called outside column_syncer_assist, so doesn't need to be exported.

return true
return false

proc getLastNonEmptySlot*[T](sr: ColumnSyncResult[T]): Slot {.inline.} =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not called outside column_syncer_assist, so doesn't need to be exported.

let failEpoch = epoch(failSlot)

# Calculate exponential rewind point in number of epochs.
let epochCount =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this substantially replicating the kind of logic already in the existing block syncer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, is it important that the column syncer have consistent behavior with the block syncer in this rewind behavior, for DA purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, is it important that the column syncer have consistent behavior with the block syncer in this rewind behavior, for DA purposes?

yes, it's important in a few ways because here we are range downloading columns, and it's significant to understand and detect partial/malicious responses, responses from orphaned blocks that can be quickly rejected, and slot movements can be performed accordingly.

var map = newString(len(man.workers))
var sleeping, waiting, pending: int
for i in 0..<len(man.workers):
var ch: char
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole design seems mostly copy/paste from the existing block syncer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly because i thought these concepts shouldn't change, as they didn't change the same for blob sidecars, either, but i do agree that there can be leaner ways of integrating columns within the existing block syncer, i will take a look at that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you note, for a draft, for mainly devnets, and since there might be another syncer implementation entirely coming, it's reasonable enough for what it is.

for i in 0..<len(man.workers):
man.workers[i].future =
columnSyncWorkerGreedy[A, B](man, i)
if ColumnSyncerFlag.Impartial in man.flags:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for ColumnSyncerFlag.Greedy in man.flags and ColumnSyncerFlag.Impartial in man.flags to be true? If not, seems maybe better to use an enum.

Copy link
Contributor Author

@agnxsh agnxsh May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems maybe better to use an enum.

ok


proc timeLeftForColumnSyncer*(d: Duration): string =
if d == InfiniteDuration:
"--h--m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this formatting code already lives somewhere (probably the syncer) in Nimbus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah overall, because the sync speed math shouldn't change significantly for this either, but i'll still try to reduce this much code reuse,i'd say i did keep most of the code draft-ish in order to see if things worked properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable here, sure.

avg_sync_speed = man.avgSyncSpeed.formatBiggestFloat(ffDecimal, 4),
ins_sync_speed = man.insSyncSpeed.formatBiggestFloat(ffDecimal, 4),
topics = "columnsync"
of ColumnSyncerDirection.Backward:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the column syncer need to work backwards at all? Just don't say the columns are in custody yet, and if the network built on them, well...

This isn't how DA checking works for blobs, but maybe it does for columns.

import std/[strutils, sequtils, algorithm]
import stew/base10, chronos, chronicles, results
import
../spec/datatypes/[phase0, altair],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing which imports forks needs to import phase0, altair also, it's covered already.

../beacon_clock,
"."/[sync_protocol, sync_queue, column_syncer_assist]

export phase0, altair, merge, chronos, chronicles, results,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or exports it, likewise.

Queueing, Processing

ColumnAndBlockResponse* = object
blk*: fulu.SignedBeaconBlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this store the block redundantly with non-column-syncer things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's fine for now, but assuming Glamsterdam also uses PeerDAS, this won't always be a fulu.SignedBeaconBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this store the block redundantly with non-column-syncer things?

downloading blocks for this pass, is useful to detect rewinding of slots

peer_speed = req.item.netKbps(),
topics = "columnsync"

beaconBlocksByRange_v2(peer, req.slot, req.count, 1'u64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this redundantly fetch blocks that the block syncer also is fetching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fetches the blocks just in order to cross-check the blocks by the dag and decide on potential rewinds and other slot movements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but how likely is it that it and the block syncer are/have fetched the same blocks around the same time?

This kind of thing works ok, if wastefully, if there are surplus syncing peers and notably less well when there are fewer potential sync peers.

## whichever's peer status is recent and relevant
##


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of trivial formatting issue, but this is just a lot of whitespace

column_syncer_table*: OrderedTable[Slot, ColumnAndBlockResponse]
FULU_FORK_EPOCH: Epoch
MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS: uint64
MAX_BLOBS_PER_BLOCK_ELECTRA: uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this need to know about BPOs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think it will

man.assist = ColumnSyncerAssist.init(A, man.direction, man.getFirstSlot(),
man.getLastSlot(), man.chunkSize,
man.getSafeSlot, man.peerdasBlockVerifier, 1)
of ColumnSyncerDirection.Backward:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment on backfilling, and a concern here a bit too: the existing backfill mechanism has a tendency to crowd out forward syncing if there's less than a couple dozen peers. This seems to add yet more demand for syncing peers.

Is the idea that the "main" block sync gets shut off, or is the pool of sync peers now split between multiple mini-pools of 10 workers each?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the pool of sync peers now split between multiple mini-pools of 10 workers each?

something like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the practical minimum number of total peers to be comfortable with the combinations of

  • forward syncing blocks
  • backfilling blocks
  • syncing columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants