-
Notifications
You must be signed in to change notification settings - Fork 21.4k
Description
Received via discord from @michaelsproul. CC @fjl
I think I may have identified a small Geth bug. I'm not sure though because my Go-fu is terrible so I haven't even tried to look at the code.
Out of the box, you can't sync Lighthouse+Geth on Zhejiang at the moment. Several users reported this, and I reproduced it just now.
The error that Geth reports is
ERROR[02-09|14:28:02.610] Beacon backfilling failed err="retrieved hash chain is invalid: missing withdrawals in block body"
WARN [02-09|14:28:12.188] Marked new chain head as invalid hash=530ef8..150093 badnumber=42314 badhash=1c5f2c..b07418
(full error: https://gist.github.com/michaelsproul/9082b5763f7cb90044a646f10aefeb38)
My current guess is that Geth is deserializing withdrawals: nil instead of withdrawals: [] when decoding blocks from devp2p. The block hash check passes (because nil and [] are RLP-equivalent, right?) but then Geth realises that there are no withdrawals when there should be based on the timestamp. So a valid block hash gets marked invalid and the chain breaks.
Nodes following the chain since genesis won't have had this issue because they'll have got a JSON execution payload straight from the CL. The reason it happens more often with Lighthouse is that we have an optimisation that skips the newPayload message while syncing, so we don't drip feed every payload to the EL (forcing the EL to download its own payloads). This feature can be turned off with --disable-optimistic-finalized-sync, and indeed syncing Lighthouse-Geth with this flag doesn't trigger the issue.