AssemblyScript & Interface Types - UTF-16 String Support / Consider UTF-8 Strings

Hello! So this is a follow up from: https://github.com/WebAssembly/interface-types/issues/13

This is a long thread, exploring string encoding types in the upcoming interface types proposal. Where currently, the only supported string encoding (in the MVP), is UTF-8. 

However, AssemblyScript uses UTF-16, to stay parallel with the Web APIs, and interface types could require some double encoding for UTF-16 languages (if I understand correctly.)

It was suggested, but there are a few issues, not the full list, we see on the AS side (I'll let @dcodeIO give implementation details where necessary):

* AssemblyScript tries to stay as close as possible as to WebAPIs, and mimicks the JS String API. Functions like `.substring` and `charCodeAt` are implemented in a way that would be difficult to re-implement in UTF-8, but also could break libraries that depend on specific JS behavior (if they were to be ported to AS).
* If AssemblyScript were to support two string representations, having a "UTF8String" class would be unintuitive and cause lots of headaches. Or, trying to support both UTF-16 and UTF-8 could greatly increase module size (which would be a huge downside for the browser case).
* The most important to me (personally), is that another big / notable use case for wasm is C#/.NET in the [Blazor project](https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor), which could greatly benefit from interface types, but uses UTF-16 strings. As well as other popular lanugages that could output Wasm that use UTF-16 as their string representation, such as Java, and Kotlin.
* This list is incomplete, @dcodeIO would know more implementation details that'd make this a bit difficult.

Would be interested to hear everyone's thoughts. Looking forward to a respectful, thoughtful discussion here, and finding a good solution :smile: Thanks everyone! :+1: 

cc @lukewagner @dcodeIO @MaxGraey 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AssemblyScript & Interface Types - UTF-16 String Support / Consider UTF-8 Strings #1263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

AssemblyScript & Interface Types - UTF-16 String Support / Consider UTF-8 Strings #1263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions