Skip to content
This repository was archived by the owner on Jun 26, 2020. It is now read-only.

Binary function names #47

Closed
stoklund opened this issue Feb 23, 2017 · 2 comments · Fixed by #91
Closed

Binary function names #47

stoklund opened this issue Feb 23, 2017 · 2 comments · Fixed by #91
Labels
E-easy Issues suitable for newcomers to investigate, including Rust newcomers!

Comments

@stoklund
Copy link
Contributor

Cretonne compiles functions independently, so function names are used differently than in LLVM. They serve two purposes:

  1. In .cton test cases, the function names are all ASCII, and are used to identify individual test cases in the same file.
  2. When Cretonne is embedded as a JIT compiler, function names are used to identify other functions that may be called. This identifier can be any sequence of bytes, it doesn't have to be ASCII or UTF-8. Cretonne doesn't interpret these function names, they are opaque identifiers.

The binary function names are not well supported. They get printed out as quoted ASCII with a bunch of backslash escapes.

  • The parser doesn't supported the quoted format of function names.
  • If the name is a binary encoding of some identifier, ASCII with escapes is not a good representation.
  • Right now, function names like v7 or ebb4 get printed without quotes, and the lexer recognizes them as value and EB tokens.

Alternative function name syntax.

Over in #24, I proposed two new identifier/data tokens: %nnnn and #xxxx. We should use these to represent function names everywhere:

  1. If the function name consists entirely of ASCII alphanumerical characters and _, use the %nnnn notation. Note that this also allows for names like %0. There is no need to give special treatment to the first character.
  2. If the function name contains any other characters, use the #xxxx hexadecimal representation of the function name bytes.
  3. If the name is empty, use a special syntax, maybe noname (no %).

With these changes, the parser should stop accepting unquoted identifiers as function names.

Binary name representation.

Currently, the FuncName struct contains a String:

pub struct FunctionName(String);

This restricts us to names in UTF-8 form. We should accept any sequence of bytes, so change this to:

pub struct FunctionName(Vec<u8>);

Allocation-free representation

For extra credit: Cretonne tries to minimize heap allocations everywhere, so an internal short-string optimization would make a lot of sense:

enum NameRepr {
    Short {
        length: u8,
        bytes: [u8;12],
    }
    Long(Vec<u8>),
}
@stoklund stoklund added the E-easy Issues suitable for newcomers to investigate, including Rust newcomers! label Feb 23, 2017
@sunfishcode
Copy link
Member

WebAssembly was changed from having arbitrary byte sequences to having only valid UTF-8 byte sequences. I don't know if Cretonne wants to make a similar change, but either way, this may make it more desirable to print printable characters when possible rather than falling back to full #xxxx.

LLVM uses quotes, like %"...", for names with lexically inconvenient characters. Would that make sense here? Among other things, it would provide a simple way to handle empty strings: %"".

@stoklund
Copy link
Contributor Author

stoklund commented Jun 5, 2017

I don't expect that Cretonne's function names will be names from the wasm name section. An embedding WebAssembly VM can use the Cretonne function names as completely opaque identifiers. For example, a (module-Id, function-Id) tuple in 8 bytes fixed binary representation. Such a name is better printed in hex than in mixed ASCII/hex escapes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
E-easy Issues suitable for newcomers to investigate, including Rust newcomers!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants