-
Notifications
You must be signed in to change notification settings - Fork 746
Verify the sizes of the sub-sections within the name section #375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -1649,10 +1652,14 @@ static void read_custom_section(Context* ctx, uint32_t section_size) { | |||
++i; | |||
break; | |||
default: | |||
/* unknown subsection, skip rest of section */ | |||
ctx->offset = ctx->read_end; | |||
/* unknown subsection, skip it */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will make us accept "name" sections in which the "function" subsection isn't the first subsection, which violates the spec. Not a bad thing, but a change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. They need to be in order. I'll add a check for that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually doesn't this existing code already allow this? There doesn't seem to be any check for functions coming before locals. This change just meas that subsections we don't know about can appear anywhere, and will have their sizes validated, and will not terminate the parsing of the rest of the section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing we might want to do is to check that subsection data is actually contained in the subsection (by setting ctx->read_end to the subsection end temporarily, for example); just like read_function_body verifies a function body does not extend past its indicated length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code allows the consumer to check the sections are in the right order (it doesn't verify that the function/local indices are specified in order either, but again the consumer can check that).
Now that I think about it, I feel that stricter is better, and we shouldn't go out of our way to be lenient.
I must confess I've resorted a few times to cutting out the name section of wasm modules in a text editor, so a switch to ignore (some) custom sections would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/binary-reader.cc
Outdated
@@ -1605,6 +1605,9 @@ static void read_custom_section(Context* ctx, uint32_t section_size) { | |||
uint32_t subsection_size; | |||
in_u32_leb128(ctx, &name_type, "name type"); | |||
in_u32_leb128(ctx, &subsection_size, "subsection size"); | |||
size_t subsection_end = ctx->offset + subsection_size; | |||
if (subsection_end > ctx->read_end) | |||
RAISE_ERROR("invalid sub-section size: extends past end"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is RAISE_ERROR the right thing to do? Invalid custom sections do not cause the entire module to become invalid. That's why I ignored optional information entirely, though I agree now that checking the subsection sizes is much cleaner.
Maybe a separate callback for recoverable errors would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a good question, for custom sections that the binary-reader knows about, how should it handle errors. My feeling is that if we are going to parse a section we should validate its contents. Perhaps a flag to disable all-parsing of custom sections if people don't what this? But perhaps we should add this if and when somebody needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, most wasm modules out there have invalid name sections according to the current spec, I think. I think switching to C++ exceptions could help (we're currently using longjmp!), particularly if such exceptions clearly encode "you can continue reading the next section at offset x and the next subsection at offset y".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's kind of a weird case. I think sometimes we'll want to ignore invalid custom sections and sometimes we'll want to fail if they're invalid. In either case we'll want to bail on parsing the section if any part fails. I guess it should be up to the BinaryReader implementation what happens, like @pipcet says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the old "name" sections in wild, are we not blowing up already? It seems like we should be.
I think probably flag such as --no-custom or --no-debug-names or --skip-section=name would be for this kind of situation were you have binaries out there that use a certain custom section name but its not compliant with our current tooling convention. Ideally we would rename the section in this case (weren't we supposed to rename this one to utf8_names?) to avoid this kind of problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WebAssembly/design#1016 seems to have good support and makes all strings utf8. If that's the case then perhaps calling it utf8-name is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But strange that BinaryEncoding.md has the new content but not the new section name. Seems like that was a mistake.
If we really want to support old binaries with the previous name section we would need to build a parser that supports both, but we don't want to do that do we?
Do you happen to know what the current behavior is if this parser sees an old name section today? I do seem to remember that it somehow skips over it. Perhaps I could make that explicit in the code somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just tested the current "master" and it already blows up on the old style name section:
$ ./out/wasmdump -x ./debug-names-1.wasm
*ERROR*: @0x0000003d: unable to read string: name
So this change doesn't actually make it worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the current behavior is to bail out on any error, even in custom sections. I think we should do something fancier (as discussed above), but that doesn't have to be in this PR. Opened #378 for further discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it's probably of theoretical interest only, but I'm pretty sure I've proved that no "name" section is ambiguous: it's either a valid old-style name section, or a valid new-style one, or neither, but never both. I'd still prefer the approach of forgetting the old-style name sections ever existed, but we also have the option of first verifying that a name section splits into valid subsections; if it doesn't, it's an old-style section (and the converse holds, too), so we can still parse it.
src/binary-reader.cc
Outdated
@@ -1600,6 +1600,7 @@ static void read_custom_section(Context* ctx, uint32_t section_size) { | |||
section_name.length) == 0) { | |||
CALLBACK_SECTION(begin_names_section, section_size); | |||
uint32_t i = 0; | |||
uint32_t previous_read_end = ctx->read_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch.
test/run-tests.py
Outdated
@@ -702,6 +705,9 @@ def main(args): | |||
parser.add_argument('--no-roundtrip', | |||
help='don\'t run roundtrip.py on all tests', | |||
action='store_false', default=True, dest='roundtrip') | |||
parser.add_argument('-p', '--print-cmd', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes snuck in from the other PR
If its OK with you all we can add a flag in a separate PR when we address #378 |
No description provided.