-
Notifications
You must be signed in to change notification settings - Fork 171
Debugpy with Webassembly (proposal)
This page seeks to describe WebAssembly and how Debugpy might be modified to support debugging CPython running with WebAssembly. It's in the wiki for now as it made it convenient to have a document viewable by everyone.
How is WebAssembly code loaded into the browser?
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.
source: https://megaease.com/blog/2021/09/17/extend-backend-application-with-webassembly/
How does this code:
#include <stdio.h>
int main()
{
printf("Hello World\n");
return 0;
}
get turned into something like so?
The first step is something called WebAssembly.instantiate.
Javascript code loads the 'wasm' module and calls WebAssembly.instantiate
(or WebAssembly.instantiateStreaming
) on it.
const instance = await WebAssembly.instantiate(wasmModule, imports);
This call loads the WASM into the web page.
Javascript can now do things like so:
instance.export.main();
Which would call the 'main' function on the wasm.
When the C code is built, it had dependencies on different libs. WASM externalizes these dependencies by creating an import table. Something like so:
(import "wasi_snapshot_preview1" "proc_exit" (func $wasi_snapshot_preview1.proc_exit (type $t4)))
(import "wasi_snapshot_preview1" "fd_write" (func $wasi_snapshot_preview1.fd_write (type $t11)))
(import "wasi_snapshot_preview1" "fd_close" (func $wasi_snapshot_preview1.fd_close (type $t1)))
(import "wasi_snapshot_preview1" "fd_seek" (func $wasi_snapshot_preview1.fd_seek (type $t12)))
This is the list of imports
required by the simple hello world.
-
proc_exit
to be called for cleanup -
fd_write
to write to stdout -
fd_close
to finish using stdout -
fd_seek
to seek to the beginning of stdout
During the instantiate, the javascript code has to provide this 'table' of imports.
Here's an example that gets 'Hello World' to print into the console:
var heapu32;
var heapu8;
var stdout = console.log.bind(console);
var stderr = console.warn.bind(console);
var streams = ['', '', ''];
function printChar(stream, curr) {
var dest = stream === 1 ? stdout : stderr;
if (curr === 0 || curr === 10) {
var str = streams[stream];
dest(str);
streams[stream] = '';
} else {
streams[stream] += String.fromCharCode(curr);
}
}
function _fd_write(fd, iov, iovcnt, pnum) {
var num = 0;
for (var i = 0; i < iovcnt; i++) {
var ptr = heapu32[((iov) >> 2)];
var len = heapu32[(((iov) + (4)) >> 2)];
iov += 8;
for (var j = 0; j < len; j++) {
printChar(fd, heapu8[ptr + j]);
}
num += len;
}
heapu32[((pnum) >> 2)] = num;
return 0;
}
function _fd_close(fd) {
return 0;
}
function _fd_fdstat_get(fd, iov) {
return 0;
}
function _fd_seek(fd, offset, where) {
return 0;
}
function _proc_exit() {
return 0;
}
const imports = {};
imports.wasi_snapshot_preview1 = {};
imports.wasi_snapshot_preview1.fd_write = _fd_write;
imports.wasi_snapshot_preview1.fd_close = _fd_close;
imports.wasi_snapshot_preview1.fd_fdstat_get = _fd_fdstat_get;
imports.wasi_snapshot_preview1.fd_seek = _fd_seek;
imports.wasi_snapshot_preview1.proc_exit = _proc_exit;
fetch("hello_world_wasi.wasm")
.then(resp => WebAssembly.instantiateStreaming(resp, imports))
.then(result => {
console.log(`Starting wasm`);
heapu32 = new Uint32Array(result.instance.exports.memory.buffer);
heapu8 = new Uint8Array(result.instance.exports.memory.buffer);
result.instance.exports._start();
})
There's some interesting things to note here:
-
fd_write
needs to treat things as pointers to memory, reading one byte at a time. There is no string that's passed through, it's the raw bytes of the data written to stdout. Basically implementing the writev from POSIX. - The data in
fd_write
, are just pointers to the memory. They're not the actual buffers. Meaning just addresses (offsets) into the C program's heap. - The
memory
export allows the Javascript code toread
the heap from the C code.
That depends. There are a number of tools that pregenerate the javascript glue
code that binds the WASM to something usable in javascript:
Tool | Description | Threads | Memory allocation | Dynamic Linking | Builtin File IO | Sockets | Easy to override imports |
---|---|---|---|---|---|---|---|
Emscripten | Custom compiler and linker for C/C++/Rust code that can auto generate javascript and/or html output. | ✔️ | ✔️ | ✔️ (sort of, broken right now) |
✔️ | ✔️ (client only) |
❌ |
wasm-pack | Compiler add on for Rust code that generates javascript glue code | ❌ | ✔️ | ❌ | ✔️ (sort of, using custom async api) |
❌ | ✔️ |
wasi sdk | Custom version of clang for C/C++/Rust. Only really supports memory management. Everything else must be passed into the import table | ❌ | ✔️ | ❌ | ❌ | ❌ | ✔️ |
CPython uses sockets, threads, custom memory allocation and dynamic linking. So basically all of the features listed in the table above. As of right now, the emscripten build of CPython sort of works with sockets, dynamic linking, and threads.
Could we get away with a subset just for debugpy?
Not really. Debugpy requires:
- Sockets - for connecting to the debuggee at least. The debuggee uses sockets to communicate debugger messages in/out
- Dynamic linking - imports used by pydevd require dynamic linking
- File IO - Debugpy needs to load imports (usually from disk)
- Threads - debugpy handles the socket communication on worker threads.
Table of options