Closed
Description
Hello! So this is a follow up from: WebAssembly/interface-types#13
This is a long thread, exploring string encoding types in the upcoming interface types proposal. Where currently, the only supported string encoding (in the MVP), is UTF-8.
However, AssemblyScript uses UTF-16, to stay parallel with the Web APIs, and interface types could require some double encoding for UTF-16 languages (if I understand correctly.)
It was suggested, but there are a few issues, not the full list, we see on the AS side (I'll let @dcodeIO give implementation details where necessary):
- AssemblyScript tries to stay as close as possible as to WebAPIs, and mimicks the JS String API. Functions like
.substring
andcharCodeAt
are implemented in a way that would be difficult to re-implement in UTF-8, but also could break libraries that depend on specific JS behavior (if they were to be ported to AS). - If AssemblyScript were to support two string representations, having a "UTF8String" class would be unintuitive and cause lots of headaches. Or, trying to support both UTF-16 and UTF-8 could greatly increase module size (which would be a huge downside for the browser case).
- The most important to me (personally), is that another big / notable use case for wasm is C#/.NET in the Blazor project, which could greatly benefit from interface types, but uses UTF-16 strings. As well as other popular lanugages that could output Wasm that use UTF-16 as their string representation, such as Java, and Kotlin.
- This list is incomplete, @dcodeIO would know more implementation details that'd make this a bit difficult.
Would be interested to hear everyone's thoughts. Looking forward to a respectful, thoughtful discussion here, and finding a good solution 😄 Thanks everyone! 👍