Skip to content

Verify the sizes of the sub-sections within the name section #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 30, 2017

Conversation

sbc100
Copy link
Member

@sbc100 sbc100 commented Mar 29, 2017

No description provided.

@sbc100 sbc100 requested a review from binji March 29, 2017 21:49
Copy link
Contributor

@KarlSchimpf KarlSchimpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -1649,10 +1652,14 @@ static void read_custom_section(Context* ctx, uint32_t section_size) {
++i;
break;
default:
/* unknown subsection, skip rest of section */
ctx->offset = ctx->read_end;
/* unknown subsection, skip it */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will make us accept "name" sections in which the "function" subsection isn't the first subsection, which violates the spec. Not a bad thing, but a change in behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. They need to be in order. I'll add a check for that too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually doesn't this existing code already allow this? There doesn't seem to be any check for functions coming before locals. This change just meas that subsections we don't know about can appear anywhere, and will have their sizes validated, and will not terminate the parsing of the rest of the section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we might want to do is to check that subsection data is actually contained in the subsection (by setting ctx->read_end to the subsection end temporarily, for example); just like read_function_body verifies a function body does not extend past its indicated length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code allows the consumer to check the sections are in the right order (it doesn't verify that the function/local indices are specified in order either, but again the consumer can check that).

Now that I think about it, I feel that stricter is better, and we shouldn't go out of our way to be lenient.

I must confess I've resorted a few times to cutting out the name section of wasm modules in a text editor, so a switch to ignore (some) custom sections would make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1605,6 +1605,9 @@ static void read_custom_section(Context* ctx, uint32_t section_size) {
uint32_t subsection_size;
in_u32_leb128(ctx, &name_type, "name type");
in_u32_leb128(ctx, &subsection_size, "subsection size");
size_t subsection_end = ctx->offset + subsection_size;
if (subsection_end > ctx->read_end)
RAISE_ERROR("invalid sub-section size: extends past end");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is RAISE_ERROR the right thing to do? Invalid custom sections do not cause the entire module to become invalid. That's why I ignored optional information entirely, though I agree now that checking the subsection sizes is much cleaner.

Maybe a separate callback for recoverable errors would be better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a good question, for custom sections that the binary-reader knows about, how should it handle errors. My feeling is that if we are going to parse a section we should validate its contents. Perhaps a flag to disable all-parsing of custom sections if people don't what this? But perhaps we should add this if and when somebody needs it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, most wasm modules out there have invalid name sections according to the current spec, I think. I think switching to C++ exceptions could help (we're currently using longjmp!), particularly if such exceptions clearly encode "you can continue reading the next section at offset x and the next subsection at offset y".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's kind of a weird case. I think sometimes we'll want to ignore invalid custom sections and sometimes we'll want to fail if they're invalid. In either case we'll want to bail on parsing the section if any part fails. I guess it should be up to the BinaryReader implementation what happens, like @pipcet says.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the old "name" sections in wild, are we not blowing up already? It seems like we should be.

I think probably flag such as --no-custom or --no-debug-names or --skip-section=name would be for this kind of situation were you have binaries out there that use a certain custom section name but its not compliant with our current tooling convention. Ideally we would rename the section in this case (weren't we supposed to rename this one to utf8_names?) to avoid this kind of problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebAssembly/design#1016 seems to have good support and makes all strings utf8. If that's the case then perhaps calling it utf8-name is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But strange that BinaryEncoding.md has the new content but not the new section name. Seems like that was a mistake.

If we really want to support old binaries with the previous name section we would need to build a parser that supports both, but we don't want to do that do we?

Do you happen to know what the current behavior is if this parser sees an old name section today? I do seem to remember that it somehow skips over it. Perhaps I could make that explicit in the code somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just tested the current "master" and it already blows up on the old style name section:

$ ./out/wasmdump  -x ./debug-names-1.wasm
*ERROR*: @0x0000003d: unable to read string: name

So this change doesn't actually make it worse.

Copy link
Member

@binji binji Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the current behavior is to bail out on any error, even in custom sections. I think we should do something fancier (as discussed above), but that doesn't have to be in this PR. Opened #378 for further discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's probably of theoretical interest only, but I'm pretty sure I've proved that no "name" section is ambiguous: it's either a valid old-style name section, or a valid new-style one, or neither, but never both. I'd still prefer the approach of forgetting the old-style name sections ever existed, but we also have the option of first verifying that a name section splits into valid subsections; if it doesn't, it's an old-style section (and the converse holds, too), so we can still parse it.

@@ -1600,6 +1600,7 @@ static void read_custom_section(Context* ctx, uint32_t section_size) {
section_name.length) == 0) {
CALLBACK_SECTION(begin_names_section, section_size);
uint32_t i = 0;
uint32_t previous_read_end = ctx->read_end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch.

@@ -702,6 +705,9 @@ def main(args):
parser.add_argument('--no-roundtrip',
help='don\'t run roundtrip.py on all tests',
action='store_false', default=True, dest='roundtrip')
parser.add_argument('-p', '--print-cmd',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes snuck in from the other PR

@sbc100
Copy link
Member Author

sbc100 commented Mar 30, 2017

If its OK with you all we can add a flag in a separate PR when we address #378

@pipcet
Copy link
Contributor

pipcet commented Mar 30, 2017

My preference is to do that: merge this first, then #376, and then #378 should be possible to fix by putting a filter class between the BinaryReader and the context.

@sbc100 sbc100 merged commit 2e9e6ce into master Mar 30, 2017
@sbc100 sbc100 deleted the check_subsections branch March 30, 2017 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants