WASM needs better debugging

This is a rant, you have been warned.

I recently was messing around with Binaryen as a possible alternative to LLVM, I always find that the best possible way to learn the true capabilities of something is to quickly try to use every feature you need at once. For me this meant setting up a test project where I was going to attempt to use Binaryen to generate a simple adder function, generate debug info, run it with wasmtime, and at least trigger one breakpoint. After that I was going to try something with multiple imported memories, but I didn’t get that far.

It took me far to long to figure out, but Binaryen, the official tool that every wasm pipeline relies on, doesn’t have the capability to generate any debugging information. Now, it does support both source maps and DWARF debugging information, but it doesn’t support being the tool used to create that information.

It looks like this has come about because every single other language uses DWARF debug info, and when compiled to wasm instead of a native target, it just carries over that information, so the use case of “do some optimizations and transformations on this preexisting module” is a lot more common and requested than “I would like to use your toolkit capable of generating WASM instructions, and also emit debug info as well”.

Why this annoys me

I’m annoyed because I want to make a sandboxed scripting language, that uses the unique features of the WASM paradigm, mainly multi-memory, for use in a game engine that I can be confident throwing untrusted code into. On paper this is a perfect use case, but every time I’ve run an experiment with some portion of the ecosystem, it’s let me down. I understand that I’m trying to use new tech outside of it’s intended use case, one statically linked fully-self contained thing with no external interactions, but still it’s making me want to drop it and go back to building my own runtime.

Why does wasmer, the best runtime for speed have ZERO debug info support? Why does wasmtime, which does have DWARF support not also support source maps? Why when debugging with wasmtime do none of the local variables display out correctly?

I do know the answers to most of these. It’s that each one of these things takes a ton of effort, but they’re some of the best and most valuable things to have for a language. I practically learned C++ by just clicking around with breakpoints. Now that I code mostly in C# and Rust nowadays, so I never have to debug weird memory issues, but I still miss how good gdb and lldb support is for raw data structures. In Rust you can barely tell what variant an enum currently is, and completely forget about calling any pure member functions to get more info.

I’m slightly more pissed off, because I think I was able to get debugging information for a jit compiled toy language of mine in around two weeks. I understand that’s completely different from a huge effort like WASM, but couldn’t this be part of the original spec? Like yes there are the parts of the spec that are “just use dwarf and put it in the custom sections” but how am I supposed to generate that? I can’t put that is a wat file. I need to either use LLVM or write a from-scratch compiler because Binaryen doesn’t have a way to generate all that.

Why not just use LLVM then? Because it doesn’t fully support all the features I need. My core requirements are 1. secure, 2. fast. For game engines, there’s a lot of shared data, shared data that by design is not accessible to web assembly modules out of the box, so I need something like multi-memory to support being able to share those large, but still sandboxed, regions of memory between scripts that by design can’t be compiled into one module.

As far as I have been able to tell, LLVM does support different address spaces, and every language that you can compile with the shared memory flag is probably using that, but I want to be able to import 3+ memories into every module, a memory for each different use so to speak. And I have no clue how wasm-ld is going to react to all that, or if it even can. I still need to run a test there to be sure before switching back to plan b, build a runtime myself.

What’s the solution?

Time

The only solution here is time, and pull requests. I stand at the junction of two roads. Try to submit PRs to every stage of the WASM toolchain to get my two features to work, spending months getting to the point where I can start working on a prerequisite to the project I want to build to build a project with that can be used to build more projects. OR I could go and build the entire runtime for my scripting system from scratch. I already have a toy POC working with full jit, debugging, and even a simple lsp working. The part that I’m concerned about is security, while still being fast, and that’s what wasm offers.

Taking ALL of this into account. I’m inclined to copy the security parts of the wasm spec, create my own version, (but with good debugging), spend 5 months on that, and then have a MVP by the end in a much shorter time frame, and I know along the way I’ll find ways to increase the security closer to the point that wasm is at.

Plus, I’ve been researching how wasm runtimes handle memory safety, which is my biggest concern, and it looks like most of the safety actually comes from that fact that all pointers are relative indexes? so an attacker would have to guess where the store is, along with where the memory they want to access is. and being that it’s only 32bit offsets for now, you can just allocate 4gb of memory and then wala, you know they can’t access anything outside of that. Of course that goes away the instant you turn on the 64bit proposal. (I’m not kidding, wasmtime apparently allocates 2GB of ram just to mark all of it as a guard partition just in case someone tries to access a negative index -_- why are memory addresses even signed in the first place?)

Liked it? Take a second to support WireWhiz on Patreon!
Become a patron at Patreon!

2 comments

  1. It’s a limitation of some languages and compilers, not WebAssembly. In Rust, the debugging experience is very poor.

    But in programs written in Zig, error traces are printed the same way as other targets, with correct source files and line numbers. The Chrome debugger and the Visual Studio Code WebAssembly Dwarf Debugging extension work perfectly.

    Emscripten has the -g flag and EMCC_DEBUG, a debug mode that is useful.

    1. I would agree that it’s a language specific problem and that editor tools help, but I specifically am attempting to create a new language with WASM exclusive features, and I’m finding that to be particularly difficult, even more so than regular language development.

Comments are closed.