-
Notifications
You must be signed in to change notification settings - Fork 276
ColumnSyncer
#7029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: column-syncer
Are you sure you want to change the base?
ColumnSyncer
#7029
Conversation
30298d7
to
42e71ee
Compare
5bde140
to
fd3c63f
Compare
# Validating received blocks one by one | ||
var | ||
hasInvalidBlock = false | ||
unviableBlock: Option[(Eth2Digest, Slot)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, Opt
is typically preferable to Option
|
||
if lenu64(data) > (req.count * NUMBER_OF_COLUMNS): | ||
# Number of data columns in the response should be less than | ||
# or equal to (MAX_BLOBS_PER_BLOCK_FULU * NUMBER_OF_COLUMNS). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_BLOBS_PER_BLOCK_FULU
isn't a thing anymore.
## empty slots). | ||
len(sr.data) == 0 | ||
|
||
proc hasEndGap*[T](sr: ColumnSyncResult[T]): bool {.inline.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not called outside column_syncer_assist
, so doesn't need to be exported.
return true | ||
return false | ||
|
||
proc getLastNonEmptySlot*[T](sr: ColumnSyncResult[T]): Slot {.inline.} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not called outside column_syncer_assist
, so doesn't need to be exported.
let failEpoch = epoch(failSlot) | ||
|
||
# Calculate exponential rewind point in number of epochs. | ||
let epochCount = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this substantially replicating the kind of logic already in the existing block syncer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, is it important that the column syncer have consistent behavior with the block syncer in this rewind behavior, for DA purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, is it important that the column syncer have consistent behavior with the block syncer in this rewind behavior, for DA purposes?
yes, it's important in a few ways because here we are range downloading columns, and it's significant to understand and detect partial/malicious responses, responses from orphaned blocks that can be quickly rejected, and slot movements can be performed accordingly.
var map = newString(len(man.workers)) | ||
var sleeping, waiting, pending: int | ||
for i in 0..<len(man.workers): | ||
var ch: char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole design seems mostly copy/paste from the existing block syncer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly because i thought these concepts shouldn't change, as they didn't change the same for blob sidecars, either, but i do agree that there can be leaner ways of integrating columns within the existing block syncer, i will take a look at that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you note, for a draft, for mainly devnets, and since there might be another syncer implementation entirely coming, it's reasonable enough for what it is.
for i in 0..<len(man.workers): | ||
man.workers[i].future = | ||
columnSyncWorkerGreedy[A, B](man, i) | ||
if ColumnSyncerFlag.Impartial in man.flags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for ColumnSyncerFlag.Greedy in man.flags
and ColumnSyncerFlag.Impartial in man.flags
to be true? If not, seems maybe better to use an enum
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems maybe better to use an enum.
ok
|
||
proc timeLeftForColumnSyncer*(d: Duration): string = | ||
if d == InfiniteDuration: | ||
"--h--m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this formatting code already lives somewhere (probably the syncer) in Nimbus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah overall, because the sync speed math shouldn't change significantly for this either, but i'll still try to reduce this much code reuse,i'd say i did keep most of the code draft-ish in order to see if things worked properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable here, sure.
avg_sync_speed = man.avgSyncSpeed.formatBiggestFloat(ffDecimal, 4), | ||
ins_sync_speed = man.insSyncSpeed.formatBiggestFloat(ffDecimal, 4), | ||
topics = "columnsync" | ||
of ColumnSyncerDirection.Backward: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the column syncer need to work backwards at all? Just don't say the columns are in custody yet, and if the network built on them, well...
This isn't how DA checking works for blobs, but maybe it does for columns.
import std/[strutils, sequtils, algorithm] | ||
import stew/base10, chronos, chronicles, results | ||
import | ||
../spec/datatypes/[phase0, altair], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing which imports forks
needs to import phase0, altair
also, it's covered already.
../beacon_clock, | ||
"."/[sync_protocol, sync_queue, column_syncer_assist] | ||
|
||
export phase0, altair, merge, chronos, chronicles, results, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or exports it, likewise.
Queueing, Processing | ||
|
||
ColumnAndBlockResponse* = object | ||
blk*: fulu.SignedBeaconBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this store the block redundantly with non-column-syncer things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's fine for now, but assuming Glamsterdam also uses PeerDAS, this won't always be a fulu.SignedBeaconBlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this store the block redundantly with non-column-syncer things?
downloading blocks for this pass, is useful to detect rewinding of slots
peer_speed = req.item.netKbps(), | ||
topics = "columnsync" | ||
|
||
beaconBlocksByRange_v2(peer, req.slot, req.count, 1'u64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this redundantly fetch blocks that the block syncer also is fetching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fetches the blocks just in order to cross-check the blocks by the dag and decide on potential rewinds and other slot movements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but how likely is it that it and the block syncer are/have fetched the same blocks around the same time?
This kind of thing works ok, if wastefully, if there are surplus syncing peers and notably less well when there are fewer potential sync peers.
## whichever's peer status is recent and relevant | ||
## | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of trivial formatting issue, but this is just a lot of whitespace
column_syncer_table*: OrderedTable[Slot, ColumnAndBlockResponse] | ||
FULU_FORK_EPOCH: Epoch | ||
MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS: uint64 | ||
MAX_BLOBS_PER_BLOCK_ELECTRA: uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this need to know about BPOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think it will
man.assist = ColumnSyncerAssist.init(A, man.direction, man.getFirstSlot(), | ||
man.getLastSlot(), man.chunkSize, | ||
man.getSafeSlot, man.peerdasBlockVerifier, 1) | ||
of ColumnSyncerDirection.Backward: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment on backfilling, and a concern here a bit too: the existing backfill mechanism has a tendency to crowd out forward syncing if there's less than a couple dozen peers. This seems to add yet more demand for syncing peers.
Is the idea that the "main" block sync gets shut off, or is the pool of sync peers now split between multiple mini-pools of 10 workers each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the pool of sync peers now split between multiple mini-pools of 10 workers each?
something like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the practical minimum number of total peers to be comfortable with the combinations of
- forward syncing blocks
- backfilling blocks
- syncing columns?
No description provided.