diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 00000000..f89fab5f --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,723 @@ +# Design of `wasm-bindgen` + +This is intended to be a bit of a deep-dive into how `wasm-bindgen` works today, +specifically for Rust. If you're reading this far in the future it may no longer +be up to date, but feel free to ping me and I can try to answer questions and/or +update this! + +## Foundation: ES Modules + +The first thing to know about `wasm-bindgen` is that it's fundamentally built on +the idea of ES Modules. In other words this tool takes an opinionated stance +that wasm files *should be viewed as ES6 modules*. This means that you can +`import` from a wasm file, use its `export`-ed functionality, etc, from normal +JS files. + +Now unfortunately at the time of this writing the interface of wasm interop +isn't very rich. Wasm modules can only call functions or export functions that +deal exclusively with `u32`, `u64`, `f32`, and `f64`. Bummer! + +That's where this project comes in. The goal of `wasm-bindgen` is to enhance the +"ABI" of wasm modules with richer types like classes, JS objects, Rust structs, +strings, etc. Keep in mind, though, that everything is based on ES Modules! This +means that the compiler is actually producing a "broken" wasm file of sorts. The +wasm file emitted by rustc, for example, does not have the interface we would +like to have. Instead it requires the `wasm-bindgen` tool to postprocess the +file, generating a `foo.js` and `foo_wasm.wasm` file. The `foo.js` file is the +desired interface expressed in JS (classes, types, strings, etc) and the +`foo_wasm.wasm` module is simply used as an implementation detail (it was +lightly modified from the original `foo.wasm` file). + +## Polyfill for "JS objects in wasm" + +One of the main goals of `wasm-bindgen` is to allow working with and passing +around JS objects in wasm. But wait, that's not allowed today! While indeed +true, that's where the polyfill comes in! + +The question here is how we shoehorn JS objects into a `u32` for wasm to use. +The current strategy for this approach is to maintain two module-local variables +in the generated `foo.js` file: a stack and a heap. + +### Temporary JS objects on the stack + +The stack in `foo.js` is, well, a stack. JS objects are pushed on the top of the +stack, and their index in the stack is the identifier that's passed to wasm. JS +objects are then only removed from the top of the stack as well. This data +structure is mainly useful for efficiently passing a JS object into wasm without +a sort of "heap allocation". The downside of this, however, is that it only +works for when wasm doesn't hold onto a JS object (aka it only gets a +"reference" in Rust parlance). + +Let's take a look at an example. + +```rust +// foo.rs +wasm_bindgen! { + pub fn foo(a: &JsObject) { + // ... + } +} +``` + +Here we're using the special `JsObject` type from the `wasm-bindgen` library +itself. Our exported function, `foo`, takes a *reference* to an object. This +notably means that it can't persist the object past the lifetime of this +function call. + +Now what we actually want to generate is a JS module that looks like (in +Typescript parlance) + +```ts +// foo.d.ts +export function foo(a: any); +``` + +and what we actually generate looks something like: + +```js +// foo.js +import * as wasm from './foo_wasm'; + +let stack = []; + +function addBorrowedObject(obj) { + stack.push(obj); + return stack.length - 1; +} + +export function foo(arg0) { + const idx0 = addBorrowedObject(arg0); + try { + wasm.foo(idx0); + } finally { + stack.pop(); + } +} +``` + +Here we can see a few notable points of action: + +* The wasm file was renamed to `foo_wasm.wasm`, and we can see how the JS module + generated here is importing from the wasm file. +* Next we can see our `stack` module variable which is used to push/pop items + from the stack. +* Our exported function `foo`, takes an arbitrary argument, `arg0`, which is + converted to an index with the `addBorrowedObject` object function. The index + is the passed to wasm so wasm can operate with it. +* Finally, we have a `finally` which frees the stack slow as it's no longer in + used, issuing a `pop` for what was pushed at the start of the function. + +It's also helpful to dig into the Rust side of things to see what's going on +there! Let's take a look at the code that `wasm_bindgen!` generates in Rust: + +```rust +// what the user wrote +pub fn foo(a: &JsObject) { +} + +#[export_name = "foo"] +pub extern fn __wasm_bindgen_generated_foo(arg0: u32) { + let arg0 = unsafe { + ManuallyDrop::new(JsObject::__from_idx(arg0)) + }; + let arg0 = &*arg0; + foo(arg0); +} +``` + +And as with the JS, the notable points here are: + +* The original function, `foo`, is unmodified in the output +* A generated function here (with a unique name) is the one that's actually + exported from the wasm module +* Our generated function takes an integer argument (our index) and then wraps it + in a `JsObject`. There's some trickery here that's not worth going into just + yet, but we'll see in a bit what's happening under the hood. + +### Long-lived JS objects in a slab + +The above strategy is useful when JS objects are only temporarily used in Rust, +for example only during one function call. Sometimes, though, objects may have a +dynamic lifetime or otherwise need to be stored on Rust's heap. To cope with +this there's a second half of management of JS objects, a slab. + +JS Objects passed to wasm that are not references are assumed to have a dynamic +lifetime inside of the wasm module. As a result the strict push/pop of the stack +won't work and we need more permanent storage for the JS objects. To cope with +this we build our own "slab allocator" of sorts. + +A picture (or code) is worth a thousand words so let's show what happens with an +example. + +```rust +// foo.rs +wasm_bindgen! { + pub fn foo(a: JsObject) { + // ... + } +} +``` + +Note that the `&` is missing in front of the `JsObject` we had before, and in +Rust parlance this means it's taking ownership of the JS value. The exported ES +module interface is the same as before, but the ownership mechanics are slightly +different. Let's see the generated JS's slab in action: + +```js +import * as wasm from './foo_wasm'; // imports from wasm file + +let slab = []; +let slab_next = 0; + +function addHeapObject(obj) { + if (slab_next == slab.length) + slab.push(slab.length + 1); + const idx = slab_next; + const next = slab[idx]; + slab_next = next; + slab[idx] = { obj, cnt: 1 }; + return idx; +} + +export function foo(arg0) { + const idx0 = addHeapObject(arg0); + wasm.foo(idx0); +} + +export function __wbindgen_object_drop_ref(idx) { + let obj = slab[idx]; + obj.cnt -= 1; + if (obj.cnt > 0) + return; + // If we hit 0 then free up our space in the slab + slab[idx] = slab_next; + slab_next = idx; +} +``` + +Unlike before we're now calling `addHeapObject` on the argument to `foo` rather +than `addBorrowedObject`. This function will use `slab` and `slab_next` as a +slab allocator to acquire a slot to store the object, placing a structure there +once its found. + +Note here that a reference count is used in addition to storing the object. +That's so we can create multiple references to the JS object in Rust without +using `Rc`, but it's overall not too important to worry about here. + +Another curious aspect of this generated module is the +`__wbindgen_object_drop_ref` function. This is one that's actually imported from +wasm rather than used in this module! This function is used to signal the end of +the lifetime of a `JsObject` in Rust, or in other words when it goes out of +scope. Otherwise though this function is largely just a general "slab free" +implementation. + +And finally, let's take a look at the Rust generated again too: + +```rust +// what the user wrote +pub fn foo(a: JsObject) { +} + +#[export_name = "foo"] +pub extern fn __wasm_bindgen_generated_foo(arg0: u32) { + let arg0 = unsafe { + JsObject::__from_idx(arg0) + }; + foo(arg0); +} +``` + +Ah that looks much more familiar! Not much interesting is happening here, so +let's move on to... + +### Anatomy of `JsObject` + +Currently the `JsObject` struct is actually quite simple in Rust, it's: + +```rust +pub struct JsObject { + idx: u32, +} + +// "private" constructors + +impl Drop for JsObject { + fn drop(&mut self) { + unsafe { + __wbindgen_object_drop_ref(self.idx); + } + } +} +``` + +Or in other words it's a newtype wrapper around a `u32`, the index that we're +passed from wasm. The destructor here is where the `__wbindgen_object_drop_ref` +function is called to relinquish our reference count of the JS object, freeing +up our slot in the `slab` that we saw above. + +If you'll recall as well, when we took `&JsObject` above we generated a wrapper +of `ManuallyDrop` around the local binding, and that's because we wanted to +avoid invoking this destructor when the object comes from the stack. + +### Indexing both a slab and the stack + +You might be thinking at this point that this system may not work! There's +indexes into both the slab and the stack mixed up, but how do we differentiate? +It turns out that the examples above have been simplified a bit, but otherwise +the lowest bit is currently used as an indicator of whether you're a slab or a +stack index. + +## Exporting a function to JS + +Alright now that we've got a good grasp on JS objects and how they're working, +let's take a look at another feature of `wasm-bindgen`: exporting functionality +with types that are richer than just numbers. + +The basic idea around exporting functionality with more flavorful types is that +the wasm exports won't actually be called directly. Instead the generated +`foo.js` module will have shims for all exported functions in the wasm module. + +The most interesting conversion here happens with strings so let's take a look +at that. + +```rust +wasm_bindgen! { + pub fn greet(a: &str) -> String { + format!("Hello, {}!", a) + } +} +``` + +Here we'd like to define an ES module that looks like + +```ts +// foo.d.ts +export function greet(a: string): string; +``` + +To see what's going on, let's take a look at the generated shim + +``` +import * as wasm from './foo_wasm'; + +function passStringToWasm(arg) { + const buf = new TextEncoder('utf-8').encode(arg); + const len = buf.length; + const ptr = wasm.__wbindgen_malloc(len); + let array = new Uint8Array(wasm.memory.buffer); + array.set(buf, ptr); + return [ptr, len]; +} + +function getStringFromWasm(ptr, len) { + const mem = new Uint8Array(wasm.memory.buffer); + const slice = mem.slice(ptr, ptr + len); + const ret = new TextDecoder('utf-8').decode(slice); + return ret; +} + +export function greet(arg0) { + const [ptr0, len0] = passStringToWasm(arg0); + try { + const ret = wasm.greet(ptr0, len0); + const ptr = wasm.__wbindgen_boxed_str_ptr(ret); + const len = wasm.__wbindgen_boxed_str_len(ret); + const realRet = getStringFromWasm(ptr, len); + wasm.__wbindgen_boxed_str_free(ret); + return realRet; + } finally { + wasm.__wbindgen_free(ptr0, len0); + } +} +``` + +Phew, that's quite a lot! We can sort of see though if we look closely what's +happening: + +* Strings are passed to wasm via two arguments, a pointer and a length. Right + now we have to copy the string onto the wasm heap which means we'll be using + `TextEncoder` to actually do the encoding. Once this is done we use an + internal function in `wasm-bindgen` to allocate space for the string to go, + and then we'll pass that ptr/length to wasm later on. + +* Returning strings from wasm is a little tricky as we need to return a ptr/len + pair, but wasm only supports one return value. To work around this we're + actually returning a pointer to a ptr/len pair, and then using functions to + access the various fields. + +* Some cleanup ends up happening in wasm. The `__wbindgen_boxed_str_free` + function is used to free the return value of `greet` after it's been decoded + onto the JS heap (using `TextDecoder`). The `__wbindgen_free` is then used to + free the space we allocated to pass the string argument once the function call + is done. + +At this point it may be preditable, but let's take a look at the Rust side of +things as well + +```rust +pub fn greet(a: &str) -> String { + format!("Hello, {}!", a) +} + +#[export_name = "greet"] +pub extern fn __wasm_bindgen_generated_greet( + arg0_ptr: *const u8, + arg0_len: usize, +) -> *mut String { + let arg0 = unsafe { + let slice = ::std::slice::from_raw_parts(arg0_ptr, arg0_len); + ::std::str::from_utf8_unchecked(slice) + }; + let _ret = greet(arg0); + Box::into_raw(Box::new(_ret)) +} +``` + +Here we can see again that our `greet` function is unmodified and has a wrapper +to call it. This wrapper will take the ptr/len argument and convert it to a +string slice, while the return value is boxed up into just a pointer and is +then returned up to was for reading via the `__wbindgen_boxed_str_*` functions. + +So in general exporting a function involves a shim both in JS and in Rust with +each side translating to or from wasm arguments to the native types of each +language. The `wasm-bindgen` tool manages hooking up all these shims while the +`wasm_bindgen!` macro takes care of the Rust shim as well. + +Most arguments have a relatively clear way to convert them, bit if you've got +any questions just let me know! + +## Importing a function from JS + +Now that we've exported some rich functionality to JS it's also time to import +some! The goal here is to basically implement JS `import` statements in Rust, +with fancy types and all. + +First up, let's say we invert the function above and instead want to generate +greetings in JS but call it from Rust. We might have, for example: + +```rust +wasm_bindgen! { + #[wasm_module = "./greet"] + extern "JS" { + fn greet(a: &str) -> String; + } +} + +fn other_code() { + let greeting = greet("foo"); + // ... +} +``` + +The basic idea of exports is the same in that we'll have shims in both JS and +Rust doing the necessary translation. Let's first see the JS shim in action: + +```js +import * as wasm from './foo_wasm'; + +import { greet } from './greet'; + +// ... + +export function __wbg_f_greet(ptr0, len0, wasmretptr) { + const [retptr, retlen] = passStringToWasm(greet(getStringFromWasm(ptr0, len0))); + (new Uint32Array(wasm.memory.buffer))[wasmretptr / 4] = retlen; + return retptr; +} +``` + +The `getStringFromWasm` and `passStringToWasm` are the same as we saw before, +and like with `__wbindgen_object_drop_ref` far above we've got this weird export +from our module now! The `__wbg_f_greet` function is what's generated by +`wasm-bindgen` to actually get imported in the `foo.wasm` module. + +The generated `foo.js` we see imports from the `./greet` module with the `greet` +name (was the function import in Rust said) and then the `__wbg_f_greet` +function is shimming that import. + +There's some tricky ABI business going on here so let's take a look at the +generated Rust as well: + +```rust +fn greet(a: &str) -> String { + extern { + fn __wbg_f_greet(a_ptr: *const u8, a_len: usize, ret_len: *mut usize) -> *mut u8; + } + unsafe { + let a_ptr = a.as_ptr(); + let a_len = a.len(); + let mut __ret_strlen = 0; + let mut __ret_strlen_ptr = &mut __ret_strlen as *mut usize; + let _ret = __wbg_f_greet(a_ptr, a_len, __ret_strlen_ptr); + String::from_utf8_unchecked( + Vec::from_raw_parts(_ret, __ret_strlen, __ret_strlen) + ) + } +} +``` + +Here we can see that the `greet` function was generated but it's largely just a +shim around the `__wbg_f_greet` function that we're calling. The ptr/len pair +for the argument is passed as two arguments and for the return value we're +receiving one value (the length) indirectly while directly receiving the +returned pointer. + +## Exporting a struct to JS + +So far we've covered JS objects, importing functions, and exporting functions. +This has given us quite a rich based to build on so far, and that's great! We +sometimes, though, want to go even further and define a JS `class` in Rust. Or +in other words, we want to expose an object with methods from Rust to JS rather +than just importing/exporting free functions. + +The `wasm_bindgen!` macro currently allows an `impl` block in Rust which does +just this: + +```rust +wasm_bindgen! { + pub struct Foo { + internal: i32, + } + + impl Foo { + pub fn new(val: i32) -> Foo { + Foo { internal: val } + } + + pub fn get(&self) -> i32 { + self.internal + } + + pub fn set(&mut self, val: i32) { + self.internal = val; + } + } +} +``` + +This is a typical Rust `struct` definition for a type with a constructor and a +few methods. By encasing it in the `wasm_bindgen!` macro we're ensuring that the +types and methods are also available to JS. If we take a look at the generated +JS code for this we'll see: + +```js +import * as wasm from './foo_wasm'; + +export class Foo { + constructor(ptr) { + this.ptr = ptr; + } + + free() { + const ptr = this.ptr; + this.ptr = 0; + wasm.__wbindgen_foo_free(ptr); + } + + static new(arg0) { + const ret = wasm.foo_new(arg0); + return new Foo(ret); + } + + get() { + return wasm.foo_get(this.ptr); + } + + set(arg0) { + wasm.foo_set(this.ptr, arg0); + } +} +``` + +That's actually not much! We can see here though how we've translated from Rust +to JS: + +* Associated functions in Rust (those without `self`) turn into `static` + functions in JS. +* Methods in Rust turn into methods in wasm. +* Manual memory management is exposed in JS as well. The `free` function is + required to be invoked to deallocate resources on the Rust side of things. + +It's intended that `new Foo()` is never used in JS. When `wasm-bindgen` is run +with `--debug` it'll actually emit assertions to this effect to ensure that +instances of `Foo` are only constructed with the functions like `Foo.new` in JS. + +One important aspect to note here, though, is that once `free` is called the JS +object is "neutered" in that its internal pointer is nulled out. This means that +future usage of this object should trigger a panic in Rust. + +The real trickery with these bindings ends up happening in Rust, however, so +let's take a look at that. + +```rust +// original input to `wasm_bindgen!` omitted ... + +#[export_name = "foo_new"] +pub extern fn __wasm_bindgen_generated_Foo_new(arg0: i32) -> u32 + let ret = Foo::new(arg0); + Box::into_raw(Box::new(WasmRefCell::new(ret))) as u32 +} + +#[export_name = "foo_get"] +pub extern fn __wasm_bindgen_generated_Foo_get(me: u32) -> i32 { + let me = me as *mut WasmRefCell; + wasm_bindgen::__rt::assert_not_null(me); + let me = unsafe { &*me }; + return me.borrow().get(); +} + +#[export_name = "foo_set"] +pub extern fn __wasm_bindgen_generated_Foo_set(me: u32, arg1: i32) { + let me = me as *mut WasmRefCell; + ::wasm_bindgen::__rt::assert_not_null(me); + let me = unsafe { &*me }; + me.borrow_mut().set(arg1); +} + +#[no_mangle] +pub unsafe extern fn __wbindgen_foo_free(me: u32) { + let me = me as *mut WasmRefCell; + wasm_bindgen::__rt::assert_not_null(me); + (*me).borrow_mut(); // ensure no active borrows + drop(Box::from_raw(me)); +} +``` + +As with before this is cleaned up from the actual output but it's the same idea +as to what's going on! Here we can see a shim for each function as well as a +shim for deallocating an instance of `Foo`. Recall that the only valid wasm +types today are numbers, so we're required to shoehorn all of `Foo` into a +`u32`, which is currently done via `Box` (like `std::unique_ptr` in C++). +Note, though, that there's an extra layer here, `WasmRefCell`. This type is the +same as [`RefCell`] and can be mostly glossed over. + +The purpose for this type, if you're interested though, is to uphold Rust's +guarantees about aliasing in a world where aliasing is rampant (JS). +Specifically the `&Foo` type means that there can be as much alaising as you'd +like, but crucially `&mut Foo` means that it is the sole pointer to the data (no +other `&Foo` to the same instance exists). The [`RefCell`] type in libstd is a +way of dynamically enforcing this at runtime (as opposed to compile time where +it usually happens). Baking in `WasmRefCell` is the same idea here, adding +runtime checks for aliasing which are typically happening at compile time. This +is currently a Rust-specific feature which isn't actually in the `wasm-bindgen` +tool itself, it's just in the Rust-generated code (aka the `wasm_bindgen!` +macro). + +[`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html + +## Importing a class from JS + +Just like with functions after we've started exporting we'll also want to +import! Now that we've exported a `class` to JS we'll want to also be able to +import classes in Rust as well to invoke methods and such. Since JS classes are +in general just JS objects the bindings here will look pretty similar to the JS +object bindings describe above. + +As usual though, let's dive into an example! + +```rust +wasm_bindgen! { + #[wasm_module = "./bar"] + extern struct Bar { + fn new() -> Bar; + fn get(&self) -> i32; + fn set(&self, val: i32); + } +} + +fn run() { + let bar = Bar::new(); + let x = bar.get(); + bar.set(x + 3); +} +``` + +Here we're going to do the opposite of the above example and instead import our +class and use it from Rust. First up, let's look at the JS: + +```rust +import * as wasm from './foo_wasm'; + +import { Bar } from './bar'; + +// other support functions omitted... + +export function __wbg_s_Bar_new() { + return addHeapObject(Bar.new()); +} + +export function __wbg_s_Bar_get(ptr) { + return Bar.prototype.get.call(getObject(ptr)); +} + +export function __wbg_s_Bar_set(ptr, arg0) { + Bar.prototype.set.call(getObject(ptr), arg0) +} +``` + +Like when importing functions from JS we can see a bunch of shims are generated +for all the relevant functions. The `new` static function is translated directly +to `Bar.new` (where `Bar` is imported at the top), and then when returning we're +sure to call `addHeapObject` as we're passing ownership to Rust (which just +declares `-> Bar`, no sigils). + +The `get` and `set` functions are methods so they go through `Bar.prototype`, +and otherwise their first argument is implicitly the JS object itself which is +loaded through `getObject` like we saw earlier. + +Some real meat starts to show up though on the Rust side of things, so let's +take a look: + +```rust +pub struct Bar { + obj: JsObject, +} + +impl Bar { + fn new() -> Bar { + extern { + fn __wbg_s_Bar_new() -> u32; + } + unsafe { + let ret = __wbg_s_Bar_new(); + Bar { obj: JsObject::__from_idx(ret) } + } + } + + fn get(&self) -> i32 { + extern { + fn __wbg_s_Bar_get(ptr: u32) -> i32; + } + unsafe { + let ptr = self.obj.__get_idx(); + let ret = __wbg_s_Bar_get(ptr); + return ret + } + } + + fn set(&self, val: i32) { + extern { + fn __wbg_s_Bar_set(ptr: u32, val: i32); + } + unsafe { + let ptr = self.obj.__get_idx(); + __wbg_s_Bar_set(ptr, val); + } + } +} +``` + +In Rust we're seeing that a new type, `Bar`, is generated for this import of a +class. The type `Bar` internally contains a `JsObject` as an instance of `Bar` +is meant to represent a JS object stored in our module's stack/slab. This then +works mostly the same way that we saw JS objects work in the beginning. + +When calling `Bar::new` we'll get an index back which is wrapped up in `Bar` +(which is itself just a `u32` in memory when stripped down). Each function then +passes the index as the first argument and otherwise forwards everything along +in Rust. + +## Wrapping up + +That's currently at least what `wasm-bindgen` has to offer! If you've got more +questions though please don't hesitate to ask or open an issue!