-
Notifications
You must be signed in to change notification settings - Fork 694
Should we use ELF as a container format? #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ELF has some useful properties, like the ability to have arbitrary optional sections, to split up text and data into multiple sections for various purposes, and pretty broad existing tooling support. Downsides of ELF for WebAssembly include:
|
On Tue, May 19, 2015 at 8:11 PM, Dan Gohman [email protected]
+0.45 I like ELF, but it will take some more consideration before I'm convinced
|
@titzer specifying a subset of ELF sounds like a good idea. Want to make a PR to add a note on this in BinaryEncoding.md? Also, maybe I'm missing an angle, but I don't see any conflicts with wasm function pointers. Basically, in the most recent iteration implied in AstSemantics.md#calls, there isn't an explicit function table (the polyfill would synthesize the tables) or a priori indices: you just take the |
Would using ELF also be useful for dynamic linking, especially for the function table (instead of We can mangle types and versioning in the symbol names, a la C++. I'd be afraid of using C's approach to mangling though! |
I think we'll be able to generate slightly faster code if directly support linking functions and globals in WebAssembly. For one thing, an engine might choose to avoid a runtime PLT/GOT altogether by directly patching code. Even w/o direct patching, though, since indirect calls in wasm entail an extra load, a hand-rolled PLT call will be slower than a builtin PLT call; the builtin PLT call is basically benefiting from being able to store the raw function address directly in the PLT and knowing it can't be corrupted. |
One of the characteristic features of ELF is the concept of having two views over the file contents. Sections organize the file contents, and segments specify how the contents are to be mapped into memory. Typically, a segment covers multiple sections, but they really are two independent views over the same data. One of the purposes of this abstraction is so that the OS execve code which loads ELF programs into memory can ignore all the details of sections and just focus on making a few bulk segment mappings into the address space instead of digging into the details of each of the sections individually. There are ways we can shoehorn WebAssembly into this, perhaps by having a processor-specific (PT_LOPROC+x) segment type which says "here is a byte range of all the WebAssembly functions to be compiled", but this abstraction isn't obviously adding much value, since we already know we want a way to specify where all the functions are, where they each begin and end, up front. |
The big benefits of using ELF vs. inventing our own would be tooling, I think. As mentioned above LLVM already has ELF support, and we'd be able to use tools like objdump for inspecting .wasm files without much trouble. Even without adding any special code to objdump you'd be able to inspect the data and get raw AST bytes out etc, which is pretty useful stuff. If we registered an ELF machine type and added support for that to binutils we could make things very useful. There are definitely some ELF concepts that aren't going to map well into wasm, but I think defining a subset should constrain the problem a bit. I think it's worth trying, at least, to avoid reinventing wheels. |
Using ELF as a container has practical advantages in terms of porting. I've encountered a number of projects who's build assume all binaries are ELF binaries. PNaCl not using ELF has required some special casing. A few programs also assume they can use extra sections to embed data in binaries. |
One big question here is how we want symbol resolution to work with dynamic libraries. While ELF is theoretically flexible, all the ELF ecosystems I'm aware of use a single-level namespace. Symbol imports can be resolved from any library or executable that defines a symbol with the same name. This is in contrast to Mach-O and PE-COFF, which have two-level namespace schemes, where each symbol import specifies which library it's to be resolved by. Other strengths or weaknesses of Mach-O and/or PE-COFF aside, everyone I've talked to so far has expressed a preference for WebAssembly to use a two-level namespace (e.g. here). There are open questions, such as how best to "name" libraries to allow desirable flexibility, but I've not yet heard from anyone who thinks that these are unsolvable. In this light, I propose WebAssembly use a two-level namespace. And if we do, I propose that ELF is therefore not a good fit for WebAssembly. Does anyone disagree? Does anyone have other concerns to raise? |
Agreed on two-level namespace. Does that necessarily block out all of ELF, though? Are there other attributes of ELF that we do like? Which container should we use instead? |
+1 for two-level namespaces... I've heard the same preferences, although we've probably talked to mostly the same people :) |
The main attributes of ELF that we like are that it exists, it works reasonably well for the things it's designed for, a lot of people are familiar with it, it's extensible, and it has a lot of existing tooling support. ELF also has attributes we don't like, including clutter from historical artifacts and a lot of encoding redundancy. And, some aspects of ELF's extensibility can be a disadvantage too, because it means there could be quite a lot of gratuitous variety that WebAssembly consumers would have to support, for example putting the section or segment headers at the end (or middle!) of the file instead of the beginning, having overlapping segments, or having multiple sections with the same name. We could prohibit things like those, but the more constraints we add the less value we get from the existing ecosystem. And for WebAssembly, we may have to use custom segment and symbol types to cope with the fact that we have a virtual ISA which isn't a direct encoding of executable text in memory, custom symbol types to represent wasm's global variables, and custom section types to hold various bits of wasm metadata. ELF's extensibility means these things are all possible, but the more special things we add, the less value we get out of leveraging the existing ecosystem. If we agree on two-level namespacing, that's a significant change from the ELF ecosystem, and in my mind, that combined with the other concerns is enough to justify choosing something else. I'm not aware of any other existing container formats that are plausible to consider here, so effectively I'm proposing we invent our own container format. |
These all seem like they would dramatically increase our security attack surface, and potentially prevent important optimizations like streaming decode/compilation. |
That's why we'd have to specify a subset. Doing that wouldn't negate the benefits of using existing tooling (which would presumably handle the subset just as well as the more commonly-used set), but going to a 2-level namespace would be more of a departure. |
Supposing that a custom format were chosen... why not base it on protocol buffers? At least for the outer portions of the container that are not performance critical. This should would make it relatively easy for third parties (non-browsers) to ensure they are reading and writing the format correctly. |
I'm not sure I understand what the advantages of protobuf would be here. ELF (and other container formats) have much higher-level semantics than protobuf defines. If we define a new format then we want to also have high-level semantics. Huge protobufs (as this would be) are also not something I'd recommend going for! |
@jfbastien I'm just saying if ELF weren't used because it's decided not to be a good fit, that the custom format could, rather than being entirely ad-hoc, start from an existing general-purpose binary encoding. It doesn't have to be protocol buffers, could be something else... the only other candidate I know of is cap'n proto, which isn't optimized for space, but I'm only suggesting this for outer layers, which I assume aren't size-critical. The advantage of protobufs, of course, is that they are well-understood and have built-in extensibility. I was not aware that protobufs were unsuitable at large sizes (is this an inherent problem or an issue with the libraries that read or write them?). |
Let's move discussion of what a new custom format might look like to new issues. This issue is about whether we should use ELF. I am now proposing above that we say no, but it's still open for discussion at this point. |
@sunfishcode I agree with the sentiment (someone somewhere said) that we go forward designing our desired format, and then if ELF fits our criteria, awesome! If not, well that's ok too. |
It is looking unlikely that ELF Unless other considerations come to light, let's close this issue so that we can focus on evolving the current emerging custom container format to fit our needs. |
I if understand correctly, existing |
That's correct. The |
@sunfishcode and I met with Jim Grosbach (Apple) yesterday and discussed WebAssembly. Jim was asking what our container format would be, and suggested that we may want to consider ELF.
He points out that ELF is already supported in LLVM which would make the backend less special.
I'm not particularly familiar with this topic, but I think it's good to list pros/cons of using ELF, and detail why we'd go with it or not.
The text was updated successfully, but these errors were encountered: