Skip to content

writers/readers #62 based on io::Write/Read #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

yuntai
Copy link
Contributor

@yuntai yuntai commented Aug 18, 2018

Implement issue #62. I started writing some generic routines for #55 and then found lots of them are already done in rust-bitcoin. I was thinking of importing them directly but I saw some conversation about restructuring serialization routines in the rust-bitcoin repo so I just adopted them here.

If this approach looks good I can go ahead putting a fuzz testing.

Copy link
Contributor

@jeandudey jeandudey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. Seems OK, but found some minor nits.

Greak work!

src/util/ser.rs Outdated
pub enum Error {
CorruptedData,
BadPublicKey,
IoError(io::Error),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply it's named Io like most crates do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced to Io now.

src/util/ser.rs Outdated
}
}

impl From<io::Error> for Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a #[doc(hidden)] (hide unneeded implementation details)

pub struct Writer<W> { writer: W }
pub struct Reader<R> { reader: R }

pub trait Writeable<W: Write> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably bikeshedding, but why don't give it a name like Encodable/Decodable just like how MsgEncodable/MsgDecodable are named?

src/util/ser.rs Outdated
fn write(&self, writer: &mut Writer<W>) -> Result<(), Error>;
}

pub trait Readable<R>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

src/util/ser.rs Outdated
pub fn new(reader: R) -> Reader<R> {
return Reader { reader }
}
pub fn into_inner(self) -> R { self.reader }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation all fixed now.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good at a high level. There are a few indentation issues in the existing patch, but no big deal, and this seems more targeted at current ChannelMonitor serialization, which is fine, but ideally we'd be able to use it for that and network messages as well.

src/util/ser.rs Outdated
fn write(&self, w: &mut Writer<W>) -> Result<(), Error> {
match self {
Some(data) => {
true.write(w)?;
Copy link
Collaborator

@TheBlueMatt TheBlueMatt Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be what we do in ChannelMonitor, but note we don't do this in network messages. Not sure if its bad to have it, just noting it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed Option<?> altogether as it seems difficult to me implementing generic serialization varying depending on the type of the payload of Option. But I left something equivalent to Option<Script> which found usage in two messages.

src/util/ser.rs Outdated

pub const MAX_VEC_SIZE: usize = 32 * 1024 * 1024;

pub enum Error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these different from msgs::DecodeError? Seems to me they should be the same with the addition of IoError, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to use msgs::DecodeError now

src/util/ser.rs Outdated
}
Ok(buf[0] == 1)
}
fn read_fixed_size(&mut self, sz: usize) -> Result<Vec<u8>, Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have the same API as reader.read_exact? Seems annyoing to force a vec allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looking forward to seeing more things migrate to using this.

src/util/ser.rs Outdated
if byte_size > MAX_VEC_SIZE {
return Err(DecodeError::InvalidLength);
}
let mut ret = Vec::with_capacity(len as usize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need something to ensure that there is at least len capacity left in the buffer to be read, as allocating full-usize-memory will crash us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you meant. len*sizeof(T) is bounded and each read()? will raise 'ShortRead' if there is not enough data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess I'm blind. That said, MAX_VEC_SIZE is much too large for network messages (which can never be logner than 16K), even if its within reason for ChannelMonitors :/

@yuntai
Copy link
Contributor Author

yuntai commented Aug 28, 2018

For channel_reestablish both your_last_per_commitment_secret and my_current_per_commitment_point are specified as (option_data_loss_protect) in Bolt 2, but the encode/decode implementation is done otherewise - optional only for your_last_per_commitment_secret.
Making only your_last_per_commitment_secret optional make this message a single case where parser needs to depend on the total length(or EOF) to decode an incoming correctly.

@TheBlueMatt
Copy link
Collaborator

Oops, yea, looks like a bug on my_current_per_commitment_point. As for decode with data_loss_protect, I dunno, it kinda sucks to pass around LocalFeatures into message decoding, and for now relying on EOF is fine, but it may not be in the future. For simplicity maybe keep using EOF for now and if the spec ever changes to have different decoding based on LocalFeatures we can pass around a ref in the decoder stuff in the future?

@yuntai
Copy link
Contributor Author

yuntai commented Aug 30, 2018

Added more messages and separated fuzz targets for Writeable. gen_target_writeable.sh shows the current coverage for messages.

@yuntai yuntai force-pushed the 201808-issue62 branch 3 times, most recently from 02f04ee to f660829 Compare September 1, 2018 07:36
@yuntai yuntai changed the title [WIP] writers/readers #62 based on io::Write/Read writers/readers #62 based on io::Write/Read Sep 1, 2018
@yuntai
Copy link
Contributor Author

yuntai commented Sep 1, 2018

travis timed out

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Sep 3, 2018

Hmm, I kicked it, but it's probably just that you added more fuzz targets so the fuzzer is taking way more time.

@TheBlueMatt
Copy link
Collaborator

I assume we'll convert all the deserializers to just use the new framework before this is through so it wont be a problem, but until its done I assume travis will keep timing out. Not a huge deal.

@TheBlueMatt
Copy link
Collaborator

@yuntai Tell me when you want a more thourough review of this. I was kinda waiting to see all the messages get churned through but don't want to leave you waiting on review if you feel you're blocked on it.

@yuntai
Copy link
Contributor Author

yuntai commented Sep 8, 2018

I think it's worth a review now. All messages are covered. I was able to run fuzz locally.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. I had a few comments, but mostly nits and notes that we should probably skip including chunks of code that are unused. I'd kinda prefer to see all the message encoding/decoding stuff switched over to the new framework in peer_handler and the old crap dropped before merge, but if you'd prefer they be separate, I'm game to do so if you've got a commit that does the transition.

src/util/ser.rs Outdated
pub fn into_inner(self) -> W { self.writer }
#[cfg(feature = "fuzztarget")]
pub fn get_ref(&self) -> &W { &self.writer }
writer_fn!(write_u64, u64, be64_to_array);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its just personal preference but I tend to find this kind of thing unreadable. All said, you've added 4 lines of macro_rules! definitions to save 6 lines of write_X { }s, and made anyone reading the code have to jump to read whats going on.

Its a bit different when the macro is defining more than one line, but for one-line functions it just feels like overkill.

src/util/ser.rs Outdated
writer_fn!(write_u32, u32, be32_to_array);
writer_fn!(write_u16, u16, be16_to_array);
fn write_u8(&mut self, v: u8) -> Result<(), DecodeError> {
Ok(self.writer.write_all(&[v])?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readability nit: You can skip the Ok(...?) and just let it return the original Result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wirte_all() raise io::Error

src/util/ser.rs Outdated
return Reader { reader }
}
#[cfg(feature = "fuzztarget")]
pub fn into_inner(self) -> R { self.reader }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation issue.

src/util/ser.rs Outdated
}
}

macro_rules! reader_fn {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this definition down to impl Reader where its used?

src/util/ser.rs Outdated
impl_array!(64); // for Signature

// Tuples
macro_rules! tuple_encode {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple encode/decode looks unused (and I dont see if it would be used in ChannelMonitor serialization, either). Maybe just remove for now and we can re-add it if its needed? Generally this applies to several things here - we've got most of the stuff we're gonna want to serialize in the short term already written so having serializers for things we dont yet need is just more code review.

src/util/ser.rs Outdated

impl<W, T> Writeable<W> for Option<T>
where W: Write,
T: ::std::ops::Index<::std::ops::RangeFull, Output=[u8]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your tabstop is maybe confused? Either way, when aligning things best to use spaces to align and tab to indent. If you set your editor to highlight tabs differently it should be pretty clearly visible.

}

#[macro_export]
macro_rules! test_msg_writeable_simple {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for Ping/Pong message. Wanted to remove ping/pong message test in the top fuzz directory.

let buf = w.into_inner().into_inner();
assert_eq!(&r.into_inner().into_inner()[..p], &buf[..p]);

let encoded = <$MsgType as MsgDecodable>::decode(&buf[..]).unwrap().encode();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like overkill? If the message deserialized fine the first time and then encoded to the same value, shouldn't it definitely deserialize again this time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a way to test against the existing code. We can drop it and probably Travis will be happy again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think travis will only be happy when you remove the existing message fuzz targets to reduce the total runtime.

let mut w = Writer::new(::std::io::Cursor::new(vec![]));
msg.write(&mut w).unwrap();

let buf = w.into_inner().into_inner();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how do most other serialization frameworks work? This looks kinda awkward but maybe this is the normal way to do it? Either way, these functions will need to be non-fuzztarget-only, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems they tend to use into_inner() and get_ref() when accessing the underlying Read like instd::io::BufReader. They may be not required in the normal situation, but I'm not totally sure. I agree it shouldn't be non-fuzztarget-only.

@yuntai
Copy link
Contributor Author

yuntai commented Sep 9, 2018

I can work on migrating whole stuff before merging. My mental model is;

  • We have a raw stream at the bottom (TCP)
    Don't we need a callback Read/Write from the client?
    • flush network buffer withflush().
    • I think we could just ask std::net::TcpStream from the client
  • encryption/framing layer
    • Framing thru EOF
    • wonder no-copy encryption/decryption possible
      So this Read/Write layer ( buffering also as the timing of EOF differs from the underling) needs to be implemented.
  • and this patch layer
    Need read_to_end() to exhaust frames from below for every message read

Does this make sense to you?

@TheBlueMatt
Copy link
Collaborator

  • For the raw stream at the bottom, we don't want to rely in it being TCP - one potential use-case of the library is abstracting the networking stuff and running rust-lightning on selaed hardware and communicate messages across a serial interface or so. I think best to continue with the existing peer_handling interface (which probably requires we keep the Vec's of u8 for reading data there)
  • For the decryption side of things, we could move to no-copy decryption and just pipe the resulting blocks out to the serialization layer, but that is a pretty steep refactor, and can come later, so I'd say just do it on streams coming off of [u8]s for now. Encryption we probably want to keep encrypting into Vecs. At least for network messages they're all pretty small so I dont think the performance concerns are worth it.

@yuntai
Copy link
Contributor Author

yuntai commented Sep 9, 2018

Hmm, then there is no saving in buffer copies. Actually, MsgEncodable is superior as you did all vec::with_capacity() with pre-calculated buffer size. Now I am not so sure what's gained by migrating...
--- pause --
Ah, I guess there will be use cases where p2p messages need to be stored to files or db...

@yuntai yuntai force-pushed the 201808-issue62 branch 2 times, most recently from 70bc53b to 9a15634 Compare September 10, 2018 09:22
@yuntai
Copy link
Contributor Author

yuntai commented Sep 10, 2018

Rebased & fixed. Fuzz test went thru but got an error in test::no_existing_test_breakage; pushed it anyway as I thought you could find the cause easily while I have to be away for several hours.

no_exiting_test_breakage stops short at

TRACE [lightning::ln::peer_handler : /home/yuntai/github/yuntai/rust-bitcoin/rust-lightning/src/ln/peer_handler.rs, 405] Encoding and sending message of type 16 to 030000000000000000000000000000000000000000000000000000000000000000
TRACE [lightning::ln::peer_handler : /home/yuntai/github/yuntai/rust-bitcoin/rust-lightning/src/ln/peer_handler.rs, 435] Received message of type 16 from 030000000000000000000000000000000000000000000000000000000000000000
TRACE [lightning::ln::peer_handler : /home/yuntai/github/yuntai/rust-bitcoin/rust-lightning/src/ln/peer_handler.rs, 435] Received message of type 32 from 030000000000000000000000000000000000000000000000000000000000000000

@yuntai
Copy link
Contributor Author

yuntai commented Sep 10, 2018

Identified a problem (ShortRead). working on it and trying to figure why fuzz message test couldn't catch it.

@TheBlueMatt
Copy link
Collaborator

Yea, for perf you may want to have a capacity_hint() in Writer, or so. Though I'm not sure why I indicated we cant do no-copy in the encryption side, thats obviously bogus - right now we copy twice, once to write into a buffer, then encrypt the buffer, but we could easily just write into an encryptor and then write that out into a buffer to pass to the user's networking stack. Same on the decryption end, obviously.

yuntai added a commit to yuntai/rust-lightning that referenced this pull request Sep 10, 2018
PR lightningdevkit#119 fails on no_existing_test_breakage() which uses hex from this buggy
encoding. Perhaps use this to regenerate hex?
@yuntai
Copy link
Contributor Author

yuntai commented Sep 11, 2018

fixed a bug in new serde and squashed the fouling commit. It all looks good now and Travis is so happy!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. There's a few indentation and other nits, and a few simplifications in the message stuff that could be done, but I'm pretty much happy to rebase and merge and then clean those up later. Will do tomorrow morning if you haven't gotten to it.

msg.write(&mut w).unwrap();

let buf = w.into_inner().into_inner();
assert_eq!(&r.into_inner().into_inner()[..p], &buf[..p]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also assert that buf.len() == p?

let p = w.get_ref().position() as usize;

let buf = w.into_inner().into_inner();
eprintln!("buf({})", buf.len());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase error?

0u16.write(&mut w).unwrap();
$msg.write(&mut w).unwrap();
let mut msg = w.into_inner().into_inner();
byte_utils::be16_to_array_inplace(msg.len() as u16-2, &mut msg[..2]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just msg[..2].copy_from_slice(byte_utils::be16_to_array(msg.len() - 2)) to avoid adding another function. It should optimize away pretty easily.

src/util/ser.rs Outdated

pub trait Readable<R>
where Self: Sized,
R: Read
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation here?

src/ln/msgs.rs Outdated
@@ -43,6 +44,12 @@ pub enum DecodeError {
/// A length descriptor in the packet didn't describe the later data correctly
/// (currently only generated in node_announcement)
BadLengthDescriptor,
/// Error from std::io
Io(::std::io::Error),
/// Invalid value found when decoding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific here? AFAICT this is only used for bool decoding (making it unused currently, presumably until we move the ChannelMonitor deserializer to the new framework) when the value is not 1 or 0.

src/util/ser.rs Outdated
( $size:expr ) => (
impl<W, T> Writeable<W> for [T; $size]
where W: Write,
T: Writeable<W>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also strange tabstop here and below.

src/ln/msgs.rs Outdated
Io(::std::io::Error),
/// Invalid value found when decoding
InvalidValue,
/// Data too big
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we just merge this with BadLengthDescriptor?

src/util/ser.rs Outdated

impl<W: Write> Writeable<W> for Option<Script> {
fn write(&self, w: &mut Writer<W>) -> Result<(), DecodeError> {
Ok(if let &Some(ref script) = self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put the ()-returning if inside the Ok()?

src/ln/msgs.rs Outdated
let mut buf = [0; 32+33];
match r.read_exact(&mut buf) {
Ok(()) => Some(DataLossProtect {
your_last_per_commitment_secret: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you aren't allows to only fill in part of data_loss_protect you should be able to simplify this a bit by swapping buf for your_last_per_commitment_secret directly, then just calling Readable::read(r)? for my_current_per_commitment_point.

src/ln/msgs.rs Outdated
},
hop_data: {
let mut buf = [0u8;20*65];
r.read_exact(&mut buf)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just impl_array!(1300)?

with additional variants in DecodeError
un-fuzz-target only for get_ref() and inner_into()
move up use
indentation fixes
remvoe unused implementation
added inplace byte_utils
For [T] iteration is used for each T, read position is not predictable when
ShortRead is raised. [T] is not used in p2p messages and, if needed, Vec[T] can be used for other
structures. Also a performance gain.
and be more specific about DecodeError::InvalidValue
@TheBlueMatt TheBlueMatt mentioned this pull request Sep 12, 2018
@TheBlueMatt
Copy link
Collaborator

Will take as #170 (its no different, just squashed some commits).

TheBlueMatt added a commit that referenced this pull request Sep 12, 2018
@yuntai yuntai deleted the 201808-issue62 branch September 12, 2018 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants