-
Notifications
You must be signed in to change notification settings - Fork 407
Rewrite lightning-net-tokio using async/await and tokio 0.2 #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite lightning-net-tokio using async/await and tokio 0.2 #472
Conversation
b34f482
to
f0178a4
Compare
f0178a4
to
2dd2273
Compare
Pushed with a few fixes, plus a workaround for tokio-rs/tokio#2213 (not sure whether to call it a bug, will wait for the tokio folks to comment). |
2dd2273
to
6bc40a0
Compare
Tokio upstream fixed 2213, so I've reverted the awkward fix for it, though this is now blocked until the next tokio release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I'm reviewing this as entirely new code since #339 was not reviewed by anyone.
lightning-net-tokio/src/lib.rs
Outdated
} | ||
|
||
/// Process incoming messages and feed outgoing messages on the provided socket generated by | ||
/// accepting an incoming connection (by scheduling futures with tokio::spawn). | ||
/// | ||
/// You should poll the Receive end of event_notify and call get_and_clear_pending_events() on | ||
/// ChannelManager and ChannelMonitor objects. | ||
pub fn setup_inbound<CMH: ChannelMessageHandler + 'static>(peer_manager: Arc<peer_handler::PeerManager<SocketDescriptor<CMH>, Arc<CMH>>>, event_notify: mpsc::Sender<()>, stream: TcpStream) { | ||
let (reader, us) = Self::new(event_notify, stream); | ||
pub async fn setup_inbound<CMH: ChannelMessageHandler + 'static>(peer_manager: Arc<peer_handler::PeerManager<SocketDescriptor, Arc<CMH>>>, event_notify: mpsc::Sender<()>, stream: TcpStream) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the design decisions: what's the reasoning behind not returning a Connection
? It seems like it could be an ideal abstraction for users to interact with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made Connection non-pub as it didn't make sense to do so. Let me know if its clearer now.
pub struct Connection { | ||
writer: Option<mpsc::Sender<bytes::Bytes>>, | ||
writer: Option<io::WriteHalf<TcpStream>>, | ||
event_notify: mpsc::Sender<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the purpose of event_notify
to me? It isn't readily apparent as to how this is suppose to be used within this module and by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the docs on the connection functions are a bit clearer now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment on one of those functions with regards to clarification.
The module-level documentation should describe how events are used with the API, including a concise example, as it seems that is the primary interaction users have after creating connections. The explanation should not need to be repeated on all three connection functions.
lightning-net-tokio/src/lib.rs
Outdated
/// You should poll the Receive end of event_notify and call get_and_clear_pending_events() on | ||
/// ChannelManager and ChannelMonitor objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems out of place as it presupposes the reader knows how event_notify
is used in the API, which AFAICT is never explained. The context around calling "get_and_clear_pending_events() on ChannelManager and ChannelMonitor objects" is also missing.
lightning-net-tokio/src/lib.rs
Outdated
/// Process incoming messages and feed outgoing messages on the provided socket generated by | ||
/// accepting an incoming connection (by scheduling futures with tokio::spawn). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit verbose which indicates to me that this may be doing too much. As per my comment on line 179, I'd like to explore whether we can provide an asynchronous API to the user based around a Connection
abstraction. But I need to get a better understanding of the design before giving any concrete recommendations with regards to its interface.
|
||
(reader, us) | ||
(reader, receiver, | ||
Arc::new(Mutex::new(Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I follow why an Arc
and a Mutex
are needed. Could you describe a scenario for me where a Connection may be accessed concurrently?
} | ||
} | ||
impl<CMH: ChannelMessageHandler> peer_handler::SocketDescriptor for SocketDescriptor<CMH> { | ||
impl peer_handler::SocketDescriptor for SocketDescriptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a high-level description of this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to ping this comment. Trying to get a grasp of how send_data
works, specifically wrt how Waker
and RawWaker
fit in. Feel free to point to a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed this, gotta love the GitHub UI. The only way I know how to surface old comments again is to leave a response as a part of a review, then GH will show that response duplicatively but at the end of the discussion.
In general, Rust futures (which you can mentally model as green threads) work by calling a poll function, which either makes progress or it doesn't. To know when to poll, you provide a waker which receives a call when the runtime (in this case Tokio) thinks polling makes sense now (ie because we have more room in the socket write buffer, as returned by select() etc). Here, we use that to wake the read/process "thread" and then it calls write_buffer_space_avail() which will call send_data() and we'll end up back at poll().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. I have a high-level understanding of how this works, and I don't have a problem following what send_data
is doing.
The only open question I have is: why are RawWaker
, RawWakerVTable
, and the unsafe code needed? I couldn't find them in the async book, which instead demonstrates creating a Waker
by implementing the ArcWake
trait. But I don't know if that's applicable in this case.
No need to hold the review up on this, but curious for my own understanding of why one approach is chosen over another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the ArcWake
trait is just a utility to do essentially what we're doing, but provided by the futures
crate, which we don't depend on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, the way Futures were stabilized upstream is pretty bare-bones. The only way to implement your own future is with all the unsafe code and raw function pointers you see here, or via the async keyword on a function.
6bc40a0
to
d115e42
Compare
I added a bunch of docs that should clarify things a little bit, let me know if its better now or where I should shore it up. |
lightning-net-tokio/src/lib.rs
Outdated
/// You should poll the Receive end of event_notify and call get_and_clear_pending_events() on | ||
/// ChannelManager and ChannelMonitor objects after each receive event. In general such an event | ||
/// should occur after nearly every read event, as any read event may trigger an event which must | ||
/// be handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by the uses of "event" in this documentation. Specifically, I'm having a hard time differentiating between "read event", "receive event", and the more generic "event" (e.g., "as any read event may trigger an event"). Is the term "event" being overloaded here to refer to two different things?
pub struct Connection { | ||
writer: Option<mpsc::Sender<bytes::Bytes>>, | ||
writer: Option<io::WriteHalf<TcpStream>>, | ||
event_notify: mpsc::Sender<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment on one of those functions with regards to clarification.
The module-level documentation should describe how events are used with the API, including a concise example, as it seems that is the primary interaction users have after creating connections. The explanation should not need to be repeated on all three connection functions.
d115e42
to
725bb3e
Compare
Added an example and some more links in the Events module doc, hopefully thats a bit clearer, especially since its, indeed, somewhat overloaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example and some more links in the Events module doc, hopefully thats a bit clearer, especially since its, indeed, somewhat overloaded.
Let me make sure I understand how the term had been overloaded. Lightning events are generated by the PeerManager
via ChannelManager
when messages are received and also by ChannelMonitor
when blocks are connected. This is one use of "events".
Then in our tokio module, the unit type ()
is used as an event by passing it through an mpsc::channel
in order to notify that the aforementioned Lightning events have been received. This "receive event" indicates that the ChannelManager
and ChannelMonitor
should be queried for those Lightning events.
Is that correct? If so, could these concepts be combined? That is, could we instead have the tokio module send the Lightning events over the mpsc::channel
? This seems like exact use case that the mpsc::channel
abstraction was designed for.
Yep!
Err, not received, only that "something has happened, this something may or may not have caused some of the aforementioned events to have been created".
Yep!
Hmm, would be somewhat awkward, IMO. We handle the () here: https://github.com/TheBlueMatt/rust-lightning-bitcoinrpc/blob/master/src/main.rs#L123 and we write to it in a number of other places (not just the tokio net stuff) including eg when the user asks us to send a payment. More importantly, we started in #474 (and I have a pending commit or two, not yet PR'd to do the same for events) making Events less easy to miss by serializing and crashing/shutting down before having handled all the in-user-buffers Events (by moving them to our own buffers and serializing them). |
725bb3e
to
66e6066
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass. The recently added comments have been helpful!
lightning-net-tokio/src/lib.rs
Outdated
fn schedule_read<CMH: ChannelMessageHandler + 'static>(peer_manager: Arc<peer_handler::PeerManager<SocketDescriptor<CMH>, Arc<CMH>>>, us: Arc<Mutex<Self>>, reader: futures::stream::SplitStream<tokio_codec::Framed<TcpStream, tokio_codec::BytesCodec>>) { | ||
let us_ref = us.clone(); | ||
let us_close_ref = us.clone(); | ||
async fn schedule_read<CMH: ChannelMessageHandler + 'static>(peer_manager: Arc<peer_handler::PeerManager<SocketDescriptor, Arc<CMH>>>, us: Arc<Mutex<Self>>, mut reader: io::ReadHalf<TcpStream>, mut write_event: mpsc::Receiver<()>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, write_event
receives write_event
s (I assume from the remote peer) --- maybe a better name would be write_event_receiver
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and realized while I was at it that some comments still referred to PeerManager::write_event, which doesn't exist anymore (is PeerManager::write_buffer_space_avail).
lightning-net-tokio/src/lib.rs
Outdated
tokio::select! { | ||
v = write_event.recv() => select_write_ev!(v), | ||
read = reader.read(&mut buf) => match read { | ||
Ok(0) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so reading 0 bytes means the connection is closed, because it'll just block if the connection is still open but there's nothing to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Its pretty standard for socket APIs to do that.
lightning-net-tokio/src/lib.rs
Outdated
pub fn setup_inbound<CMH: ChannelMessageHandler + 'static>(peer_manager: Arc<peer_handler::PeerManager<SocketDescriptor<CMH>, Arc<CMH>>>, event_notify: mpsc::Sender<()>, stream: TcpStream) { | ||
let (reader, us) = Self::new(event_notify, stream); | ||
/// Process incoming messages and feed outgoing messages on the provided socket generated by | ||
/// accepting an incoming connection (by scheduling futures with tokio::spawn). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got confused by (by scheduling futures with tokio::spawn)
. Does this make sense as a replacement:
s/(by scheduling futures with tokio::spawn)/(by giving the PeerManager a socket to write peer data out on, scheduling data reads from the same socket, and transmitting said data to the PeerManager)/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually I intended to inform the user that, via tokio::spawn, things would run free. I clarified a bit.
lightning-net-tokio/src/lib.rs
Outdated
// If we get disconnected via SocketDescriptor::disconnect_socket(), we don't call | ||
// disconnect_event(), but if we get an Err return value out of PeerManager, in general, we do. | ||
// We track here whether we'll need to call disconnect_event() after the socket closes. | ||
need_disconnect_event: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wish there was a way to get rid of this variable, feels a bit hacky.
I kept having to re-reference what it means, so maybe rename more explicitly to something like "call_disconnect_event_on_shutdown"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, renamed. Honestly after all the back and forth on it and bugs and whatnot I think I agree - we should just always require one, though the handling inside PeerHandler will likely get a bunch more complicated, so I don't know if it makes sense to run forward and do it ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem to be renamed? Not a huge deal tho.
we should just always require one, though the handling inside PeerHandler will likely get a bunch more complicated, so I don't know if it makes sense to run forward and do it ASAP.
Cool, might file an issue for that then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, not received, only that "something has happened, this something may or may not have caused some of the aforementioned events to have been created".
For the case where the "something" is not one of the aforementioned events, what's the use of notifying? Could you give me an example to help understand?
Hmm, would be somewhat awkward, IMO. We handle the () here: https://github.com/TheBlueMatt/rust-lightning-bitcoinrpc/blob/master/src/main.rs#L123 and we write to it in a number of other places (not just the tokio net stuff) including eg when the user asks us to send a payment.
Could you add the write use case to the documentation? AFAICT, this usage is not mentioned anywhere. Specifically, it seems like the user is required to notify whenever they use the ChannelManager
in some manner.
This is an excellent opportunity to add (small) tests demonstrating each behavior of the library.
lightning-net-tokio/src/lib.rs
Outdated
//! TcpStream and a reference to a PeerManager and the rest is handled", with the exception of the | ||
//! act of reading bytes from the remote peer possibly triggering the need to handle some | ||
//! [Events](../lightning/util/events/enum.Event.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception here seems less of an exception and more of the primary way of interacting with the API. I would recommend rewording this to indicate as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that was meant as a "the only thing that isnt as trivial is Event handling". Hope its clearer now.
Right, sorry, from lightning_net_tokio its always either bytes-read-from-socket or socket-disconnected, though in theory it could be others (and is used generically across the rust-lightning-bitcoinrpc sample app).
Hmm, I think maybe I miscommunicated a bit here - within rust-lightning-bitcoinrpc we use the same Sender as we pass into lightning-net-tokio to handle things like "we've sent money, we should go poke the PeerHandler to see if it wants to send any data" (which of course ends up calling send_data within the SocketDescriptor), but that is somewhat unrelated to its use in lightning_net_tokio. |
66e6066
to
6f5731d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd second Jeff's comment:
This is an excellent opportunity to add (small) tests demonstrating each behavior of the library.
But if that comment were addressed, though I wouldn't say I'm thrilled with the parsability (mostly w/r/t some of the weird bools like need_disconnect_event
, disconnect_block
, disconnect
) I'd be okay with revisiting and trying to simplify later.
lightning-net-tokio/src/lib.rs
Outdated
// Ignore full errors as we just need them to poll after this point, so if the user | ||
// hasn't received the last send yet, it doesn't matter. | ||
assert!(e.is_full()); | ||
// Whenever we want to block on reading or waiting for reading to resume, we have to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/or waiting/or are waiting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think so - "we want to block on reading or block on waiting" is the de-sugaring.
I did find one more thing to test - running without a many-threaded environment has different behavior the first time you go to write bytes. This is a neat test in that it also exercises our write-blocking and RawWakerVTable logic. |
6edd4ca
to
2e0ea18
Compare
lightning-net-tokio/src/lib.rs
Outdated
tokio::select! { | ||
v = write_avail_receiver.recv() => select_write_ev!(v), | ||
read = reader.read(&mut buf) => match read { | ||
Ok(0) => { | ||
println!("Connection closed"); | ||
break; | ||
}, | ||
Ok(len) => { | ||
if let Some(blocker) = { | ||
let mut lock = us.lock().unwrap(); | ||
if lock.disconnect_state == DisconnectionState::RLTriggeredDisconnect { | ||
shutdown_socket!("disconnect_socket() call from RL"); | ||
} | ||
if lock.read_paused { | ||
let (sender, blocker) = oneshot::channel(); | ||
lock.read_blocker = Some(sender); | ||
Some(blocker) | ||
} else { None } | ||
} { | ||
tokio::select! { | ||
res = blocker => { | ||
res.unwrap(); // We should never drop the sender without sending () into it! | ||
if us.lock().unwrap().disconnect_state == DisconnectionState::RLTriggeredDisconnect { | ||
shutdown_socket!("disconnect_socket() call from RL"); | ||
} | ||
}, | ||
v = write_avail_receiver.recv() => select_write_ev!(v), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic around paused reads with the nested select!
is a bit difficult to follow. I wonder if this could be simplified a bit using a precondition on the read branch.
let read_paused = us.lock().unwrap().read_paused;
tokio::select! {
v = write_avail_receiver.recv() => {
// ...
},
read = reader.read(&mut buf), if !read_paused => match read {
// ...
},
}
Then I think this would eliminate the need for the blocker and thus the nested select!
. Instead, we would write until send_data
indicates to resume reading, in which case the read branch would be reenabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not convinced that select!{} definitely wont call the read() future if !read_paused (instead of blocking on the future and never taking that branch). I'd buy that it likely wont, but there's sadly way too little documentation on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reading the documentation, I realize my comment was definitely not right, as the docs explicitly call out this case and say that the future wont be polled. That said, I don't think it eliminates the need for the blocker - we still need a way to wake up the reader in case we got a write with resume_read set.
I initially tried to rewrite it with just one Sender which could wake the read task, but sadly we need to make sure that if we wake it due to a write_avail event, we always call write_buff_space_avail. This would mean either the channel has an unlimited depth and we scan it for any write_avail events, or we add yet another bool to indicate that write_avail needs to be called. I find both worse than keeping the current model of just having two Senders, but I removed the Option<> and simplified the logic a ton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't notice your reply before commenting, so deleted it. Looks much simpler now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, with regard to the blocker, isn't writing the only way to resume reading? That is, we can't resume reading until send_data
is called with resume_read
set true
. That was why I thought the blocker was not needed. I haven't fully reviewed the send_data
implementation, so I could be missing something.
2e0ea18
to
829e176
Compare
I saw the issue in tokio 2213 again, but I think now I realize they can't possibly fix it as it would break other things (hence, NOTABUG, I guess) - if you have pending writes frozen and then you go to write again, it doesn't make sense for the new write to complete before the pending write(s) get woken up to write their data first. Thus, it becomes a race between the select() thread waking up the pending writes before the new writers make it into some checks (unless they cache the writeable state, which also kinda probably sucks). Anyway, turns out its in fact not only easy to work around but makes a few things probably cheaper than they were to begin with, so no harm in keeping it anyway. |
lightning-net-tokio/src/lib.rs
Outdated
macro_rules! prepare_read_write_call { | ||
() => { { | ||
let mut us_lock = us.lock().unwrap(); | ||
if us_lock.disconnect_state == DisconnectionState::RLTriggeredDisconnect { | ||
shutdown_socket!("disconnect_socket() call from RL"); | ||
} | ||
}, | ||
Err(e) => { | ||
us_ref.lock().unwrap().need_disconnect = false; | ||
return future::Either::B(future::result(Err(std::io::Error::new(std::io::ErrorKind::InvalidData, e)))); | ||
} | ||
us_lock.block_disconnect_socket = true; | ||
} } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a macro and needing to set block_disconnect_socket
false in multiple places (because of breaks), could you make a simple RAII struct (DisconnectBlocker
?) for managing this state?
Then prepare_read_write_call
could be renamed to something less ambiguous like check_for_shutdown
, if still needed (see comment on disconnect check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an RAII struct seems hard, see below, as we need to check the disconnect state in the same lock as setting block_disconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could block_disconnect_socket
just be a lock? It seems to be functioning as one. Then you would get the RAII behavior for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seemed like a great idea, but once I tried to start writing it I realized we'd need to be able to hold a reference to that lock without the outer lock (as we can't hold the outer lock while calling into write_buffer_space_avail), which would imply a second Arc which seems super wasteful. If we ever end up with CPU issues on the spinlock I'd be somewhat surprised, so for now I'd say leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will defer to you on performance. From a code complexity perspective, if not using a mutex I'd then recommend:
- moving
us_lock.block_disconnect_socket = true;
out ofprepare_read_write_call
to avoid obfuscating the connection withus_lock.block_disconnect_socket = false;
- rename
prepare_read_write_call
to something more meaningful likeshutdown_socket_if_requested
// Happy-path return: | ||
if !us.block_disconnect_socket { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this handled by the following loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without an extra lock().
lightning-net-tokio/src/lib.rs
Outdated
if us_lock.disconnect_state == DisconnectionState::RLTriggeredDisconnect { | ||
shutdown_socket!("disconnect_socket() call from RL"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely in love with how disconnect_state
is used. It seems to have two purposes:
- signaling that
SocketDescriptor::disconnect_socket
was called - signaling a manner in which the select loop has exited
Could (1) be communicated via a channel? Then a branch could be added to the select!
for handling disconnection. Or would this be problematic if another branch is chosen even if a disconnect has been signaled?
Maybe a channel would be too much, but just wanted to note my concern. See also my other comment on breaking from the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we have to check disconnect_state in the same lock where setting block_disconnect_socket to meet the requirement that "after disconnect_socket() returns, no future calls into read_event or write_buffer_space_avail are made". I'm not sure how to accomplish that with a channel. I could go back to having two bools - one to indicate disconnection should occur and one to indicate that we need to call socket_disconnected. You do, however, point out a simplification here in that we can keep the need_disconnect_event bool in the schedule_read stack.
lightning-net-tokio/src/lib.rs
Outdated
macro_rules! shutdown_socket { | ||
($err: expr) => { { | ||
println!("Disconnecting peer due to {}!", $err); | ||
break; | ||
} } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange using a macro for this but not for the cases where the connection has been closed based on the result of reader.read()
. I understand in the shutdown_socket
case it is to reduce code duplication, but I don't like the inconsistency in how the loop break
is obscured behind a macro in some cases but not others.
Perhaps a cleaner way to organize the code would be to encapsulate tokio::select!
in a method that returned an enum indicating the possible outcomes. Something like Result<(), DisconnectReason>
.
enum DisconnectReason {
ConnectionClosed(String),
PeerDisconnected(String),
}
Then you can have a straight-line match
to handle the possible cases explicitly, which I think would be more readable. You may not even need disconnect_state
or at least could reduce its purpose to signaling from SocketDescriptor::disconnect_socket
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think now. I agree it should at least all use the macro, so I went ahead and did that. Further, if you try to break; the loop direct now you'll get an uninit'd variable (as the need_disconnect_event field is now on the stack). I can make it a separate function but I'm not a huge fan of an fn for the read loop and then cleanup going into a different one. A different thing we could do is give the loop a return value (yes, rust is crazy, you can do that) so that you have to provide the need_disconnect_event field in the break; call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely looking better modulo my comment on block_disconnect_socket
.
I kinda prefer if the loop returned a variable actually. Additionally, I think the uses of shutdown_socket
would read better with an enum reason similar to the one given in my earlier comment only without the string. Otherwise, it's not clear what the true
and false
literals represent without looking at the definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made the loop return a local enum, though I fear we've gotten into diminishing returns at this point.
96f70e9
to
02d4c8a
Compare
Codecov Report
@@ Coverage Diff @@
## master #472 +/- ##
==========================================
+ Coverage 90.07% 90.13% +0.05%
==========================================
Files 34 34
Lines 19053 19052 -1
==========================================
+ Hits 17162 17172 +10
+ Misses 1891 1880 -11
Continue to review full report at Codecov.
|
02d4c8a
to
87f7ad5
Compare
lightning-net-tokio/src/lib.rs
Outdated
// If Rust-Lightning tells us to disconnect, we don't, then, call socket_disconnect() after | ||
// closing the socket, but if the socket closes on its own (ie our counterparty does it), we do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extremely confused by this documentation. I'm trying to correspond it to the enum variants and how they are used in the code but am failing.
If Rust-Lightning tells us to disconnect, we don't, then, call socket_disconnect() after closing the socket
Is this saying we don't disconnect? But then says we close the socket. Isn't closing the socket disconnecting? What is meant by "disconnect" here?
Does this correspond to the PeerDisconnected
variant? Seems odd to have this sentence first when that variant is second. I'd recommend just putting the appropriate documentation on each variant. Then have the enum documentation itself say what the enum is used for not how it is used.
but if the socket closes on its own (ie our counterparty does it), we do.
Does this correspond to the CloseConnection
variant? Isn't that when read
returns 0 or an error? But in those cases, PeerDisconnected
is used in the code.
I also read it as the imperative "close connection" which would make sense for "we do" if I understood what that is referring to. :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment better now? I reworded it to include the enum in the text so that hopefully its clearer. I think we're way past the point where this should be merged and we can do more cleanups in followons if there's still confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment better now? I reworded it to include the enum in the text so that hopefully its clearer.
Ok, after a lot of staring and conferring with Val, the confusing part is , then,
. When reading this:
If Rust-Lightning tells us to disconnect, we don't, then, call socket_disconnect() after closing the socket
I took this as, when Rust-Lighting tells us to disconnect:
- We don't disconnect
- Then we call socket_disconnected()
- But only after closing the socket
Which was confusing because it sounded like you were saying we don't disconnect but we close the socket.
However, I can now see the , then,
is a manner of speaking (in particular, your manner of speaking) which IMHO does not come across well when reading it. Given that I now recognize this (both from experience over chat and in person) but still couldn't parse the comment correctly, it indicates others may have the same problem.
This is why I tend to request very succinct writing, cutting out unnecessary verbiage; it avoids needless confusion.
In this case, regardless of rewording, I think that instead of repeating how the code works the documentation should say what the enum is used for. FWIW, I'd recommend something like this:
// The manner in which the socket was disconnected.
enum Disconnect {
// Indicates Rust Lightning disconnected the socket as a result of a PeerHandleError.
CloseConnection,
// Indicates the peer disconnected the socket and thus Rust Lightning should be notified via socket_disconnected().
PeerDisconnected,
}
Irrespective of whether the above wording is accurate (I'm not saying it is), writing a comment like this may clarify the language as there may be ambiguity between "close", "disconnect", "socket", and "connection". Which could also result in more accurate enum variant naming.
I'd also like to use this opportunity to make a request regarding code reviews. When I'm generally confused by something and ask a question for clarification, I'm not necessarily requesting that a change be made. I'm really just looking for an answer to the question. And from that I may suggest a specific change that I think would help remove confusion.
This is particularly the case with regards to comments. That is much more efficient than rewriting something and asking, "How about now?"
I think we're way past the point where this should be merged and we can do more cleanups in followons if there's still confusion.
First off, thank you for reaching out to me on Slack saying to not take this the wrong way. I genuinely appreciate you doing that.
I just want to note that reviewers have an important role to play in determining whether a change is merged. And any confusion should err on the side of the reviewers given they are looking at the code largely from the perspective of a reader rather than the writer.
I do agree that schedule_read
looks to be in pretty good shape. That said, it is only part of the code under review. I did leave a comment about send_data
asking for guidance in reviewing it. I would like to review that before merging so would appreciate a follow up there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I'm generally confused by something and ask a question for clarification, I'm not necessarily requesting that a change be made
Hmm, fair, ok. I generally air on the side of "answer by updating the comment" as it usually implies less likelihood of people being confused down the road, case someone else is likely to have the same question.
As for review process in general, I agree, only noting that we don't have the resources to block things for as long as we are. That's probably an indication of a problem different from reviewers taking too long (documentation not up to par, for example), of course. In this specific case, probably the documentation on PeerHandler itself needs to be the focus of the improvements, though we can do that in a separate PR.
87f7ad5
to
ca3c8d9
Compare
lightning-net-tokio/src/lib.rs
Outdated
// If Rust-Lightning tells us to disconnect (ie Disconnect::CloseConnection), we don't, then, | ||
// call socket_disconnected() (informing Rust-Lightning that we're disconnecting) after | ||
// closing the socket, but if the socket closes on its own (ie our counterparty closes the | ||
// connection and we set disconnect_type to PeerDisconnected), we do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confused me at first too, I suggest this phasing --
If Rust-Lightning tells us to disconnect (i.e. through Disconnect::CloseConnection), then we close the socket but do not call PeerManager's socket_disconnected() (which informs Rust-Lightning that we're disconnecting). But, if the socket closes on its own, (i.e. if our counterparty closes the connection, causing us to set disconnect_type to PeerDisconnected), we do call PeerManager's socket_disconnected().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took Jeff's suggested wording by mixing it into the enum. Is that clearer now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great wording now!
It looks like we don't currently use the Vec as a Vec, and can happily take a slice, which makes things easier on the calling side.
ca3c8d9
to
3f5821b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments around clarity in the implementation comments. Otherwise, LGTM.
} | ||
} | ||
impl<CMH: ChannelMessageHandler> peer_handler::SocketDescriptor for SocketDescriptor<CMH> { | ||
impl peer_handler::SocketDescriptor for SocketDescriptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining. I have a high-level understanding of how this works, and I don't have a problem following what send_data
is doing.
The only open question I have is: why are RawWaker
, RawWakerVTable
, and the unsafe code needed? I couldn't find them in the async book, which instead demonstrates creating a Waker
by implementing the ArcWake
trait. But I don't know if that's applicable in this case.
No need to hold the review up on this, but curious for my own understanding of why one approach is chosen over another.
tokio::time::timeout(Duration::from_secs(10), a_connected.recv()).await.unwrap(); | ||
tokio::time::timeout(Duration::from_secs(1), b_connected.recv()).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason the timeouts are different for these? Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because by the time the connection has done the handshake and disconnected from one end, the other end should be essentially already disconnected.
This is a rather major rewrite, using async/await and tokio 0.2, which cleans up the code a ton as well as adds significantly to readability.
If rust-lightning tells us to disconnect a socket after we read some bytes from the socket, but before we actually give those bytes to rust-lightning, we may end up calling rust-lightning with a Descriptor that isn't registered anymore. Sadly, there really isn't a good way to solve this, and it should be a pretty quick event, so we just busy-wait.
3f5821b
to
174ccf5
Compare
Happy to take any followup comments in more PRs, so feel free to continue review if anyone isn't happy with where this is, but Jeff's ACK is good enough for me. |
This is a rather major rewrite, using async/await and tokio 0.2,
which cleans up the code a ton as well as adds significantly to
readability.
CC #423.