-
Notifications
You must be signed in to change notification settings - Fork 694
WASM application security considerations #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See #205 for my TODO on this :-) |
This isn't about bugs, it's about us specifically security-hardening interfaces like the externally visible heap. It does overlap with #205 somewhat, though! In this case none of the wasm modules would be buggy or malicious. Think of this as an equivalent to CSRF, not an equivalent to a buffer overflow. |
As currently described, wasm modules would have an associated origin, like every JS script/function and would be subject to the same same-origin security policy which basically limits all cross-origin (access except for a tiny set of white-listed DOM objects+properties (Location, Window)). There is also the question of whether to allow cross-origin script loading (as JS does) without requiring CORS (personally, I'd like to avoid that if we can). Are there other specific concerns? |
It might be worthwhile to consider whether we want to allow JS (even from the same origin) full access to the heap or only to specific regions of the heap. Essentially, does the wasm module decide what external JS has access to, or is it all-or-nothing? I can see arguments for both. It's not clear to me how a loaded wasm module is exposed in the dom yet so I'm not sure exactly how the same-origin policy governs it. In current emscripten an asm.js module is just crammed into the global scope where third-party JS could definitely access the heap. I assume we don't want this, but what specific steps are we taking to prevent it? Presumably wasm modules are loaded via module loaders (imports), but in that case do we block cross-origin imports? If I import a module a 2nd time, it has to provide the same module so that each ES6 module's imports are the same. How do we secure that across origins? Do we want to take steps to prevent a module's heap from 'leaking out' of the origin and being accessed by malicious actors? For the record, I don't think we need to necessarily protect against most attacks, but we should think through our security model and lay out a case for why we chose our final set of primitives. This interacts when we are figuring out exactly how to expose the heap to JS for interop (a current point of discussion, I believe), and it also potentially interacts with address space management (if we introduce PROT_READ pages, do we also enforce that read-only status for external JS? Do we introduce PROT_JS_READ and PROT_JS_WRITE?) |
@kg I agree that we should make sharing-linear-memory-with-JS opt-in, and that is, subtly, already the wording in Web.md: "If allowed by the module, JavaScript can alias a loaded module's linear memory via Typed Arrays". I had been thinking an all-or-nothing mode, but it is interesting to consider something finer-grained. One idea is that we could start going in the direction described by the WebIDL integration and define a "memory region" opaque reference type. When a "memory region" was passed to JS, a typed array view would pop out the other side.
The current proposal is to load wasm modules just like ES6 modules. So if you load wasm via
Only if it's third-party JS you loaded into your origin and the heap was exposed to JS as discussed above. If you load malicious third-party code into your origin, it can already do all manner of other bad things like take over the DOM, call arbitrary JS and wasm exports, etc. If you want sandboxing for untrusted code, you want iframes (although some o-cap advocates might argue it's technically achievable within a single origin by limiting what the untrusted code has access to...).
That would be the default, but as I said above, I'd rather require CORS so we don't have to worry about things like error sanitization that we have to worry about now in the JS engine.
This will be defined by the loader spec in stage 0, under "memoization". The short answer is that the "registry" used to memoize is (as you might expect) per-realm.
The same-origin policy already does this; a module (es6 or wasm) in one origin is just not reachable/visible to any other origin.
We can specify that any typed array views of memory that are made partially or fully inaccessible would be detached; otherwise we'd need to change the semantics of typed array access in JS. This is already the proposal for what to do when memory is resized (to avoid opening a can of engine worms). |
One way to deal with this would be to expose a JS implementation of map_shmem and/or shmem_create that returns a TypedArray. Then you can make JavaScript go through the proposed shared memory mapping mechanism (shmem_create&map_shmem). That provides opt-in sharing with pretty minimal constraints on the WebAssembly VM. |
@AndrewScheidecker are you proposing this mechanism to share memory between JS and wasm? |
@jfbastien Yes, and also that it should be the only way that JS is allowed to access WASM memory. |
@AndrewScheidecker I suggest moving this to a separate issue. It has deeply constraining implications on what wasm is allowed to do going forward, including heap resizing, page table management, GC, debugging, ... |
shmem is an interesting idea. However, it seems like we don't actually need the full power that shmem provides (the ability for one page to be simultaneously mapped into several different virtual address ranges); all JS needs is a pointer. An unpleasant consequence would be that JS wouldn't be able to view any regions of linear memory that were A general point that I should have pointed out earlier, though: in the MVP, with all Web API access going through JS, the most natural strategy that Emscripten/llvm-wasm will use is to alias the entire linear memory (so that pointers work on both sides). So I'm not even really sure we'd benefit from this fine-grained control in the MVP. |
I hadn't thought about the interaction with map_file, but I think it's also an argument for applying those restrictions to whatever mechanism JavaScript can use to see wasm memory. If you provide more power, it forces the wasm VM to run in the same OS process as the JS. Executing a wasm module in separate OS processes seems necessary to support a truly 64-bit address space, or to even support allocating most of a 32-bit address space when invoked by a 32-bit browser, so I think we should make sure the spec doesn't make that impractical. It's pointless if the polyfill needs OS support to implement this functionality, but if we can accept the polyfill only sharing memory within browser process, that's not necessary. |
Synchronous calls to/from JS and Web APIs (something that was questioned and which multiple browsers were strongly in favor of to support the high-level goal of tight integration with the existing web platform) already practically (though not theoretically) force wasm into the same process (and callstack) as JS. In the future, once wasm can access all APIs without going through JS (see GC.md), it will be possible to create a Web Worker containing only wasm that could easily be launched in a separate OS process. In fact, Web Workers started out in separate processes in some browsers but were brought back in for various reasons. There is also a (rather vague) multiprocess support future feature that could allow wasm to more explicitly request a separate process. I don't quite follow what you mean by "truly 64-bit address space". Even in native code, OSes put limitations on apps from mapping more than a few TB (in some cases, more than a few GB). Post-MVP, wasm will allow int64 pointers and thus >4GiB heaps. I'm not sure what other qualities of 64-bit address space are missing. |
I think it's important that it's possible to run a WebAssembly process in a separate OS process, so you're not competing for address space with whatever else is going on in the browser process. Requiring you to explicitly choose to do that is fine. That would also imply that you have to go through shmem_create/map_shmem to share memory with that process.
Sorry, that wasn't clear. I mean that if a WebAssembly process is to use anywhere near the virtual address space allowed by the OS, then it must do so in a separate OS process from the browser and other WebAssembly processes. |
It would be possible to run a wasm process in a separate OS process, even with JS typed array views on linear memory because at the impl level both
Agreed on wanting to unlock this capability (I think it's way more valuable for 32-bit). Given that browsers will start with multiple wasm modules (and the JS engine) sharing the same process (many performance reasons), I think having an explicit way to request a separate process (and avoid the sync RPC issues by design is the best path forward. |
I tentatively agree with @lukewagner because it seems difficult to make guarantees about APIs when developers expects them to be synchronous. I'd like this to be disproved though: it would be wonderful if it were possible to just put wasm into its own process when convenient! |
@jfbastien FWIW, even before specific multi-process features, Web Workers containing only wasm code would be an easier first target to move out of process. |
Tentatively closing this now; new questions/ideas welcome in new issues. |
#302 and a few other discussions have raised an issue we need to at least consider for the MVP: Application security.
The wasm sandbox provides security for the OS/browser hosting the wasm application, so we're all good there. Security for the application itself - the safety of its data, for example - is another concern.
For JavaScript/DOM applications this is currently addressed by the same-origin policy, data hiding (inside closures), etc. The approach used in JS+DOM is - to put it mildly, rather complicated.
When thinking about JS->wasm and wasm->JS interop, along with scenarios where multiple wasm applications are loaded in a page (from different origins), we need to make sure nothing we spec allows for the integrity of a wasm module's heap to be significantly compromised. Otherwise, a couple years down the road we have some 'worst case' scenarios equivalent to someone being able to pull your whole email history out of the heap of your wasm email client.
This mostly comes up when addressing design considerations like how the wasm heap is exposed to JS (if at all), how data crosses the JS<->wasm boundary, how function pointers work, etc. The 'obvious' solution to some of these problems will have some significant security consequences.
The text was updated successfully, but these errors were encountered: