Open
Description
Describe the bug
It seems the handling of multibyte UTF-16 encodings is incorrect in lsp-mode
.
To Reproduce
Create a file with just these contents: "🍋" - a single lemon emoji. Place the cursor after the lemon and type a character ("l", say, for lemon). Most emojis including this one are represented by two UTF-16 bytes, so since LSP specifies offsets as in a UTF-16 string representation, this is at column 2.
But lsp-mode
sends column 1:
{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///home/w/utf16.lean","version":1},"contentChanges":[{"range":{"start":{"line":0,"character":1},"end":{"line":0,"character":1}},"rangeLength":0,"text":"l"}]}}
Expected behavior
Compare with e.g. the VSCode sample client, which sends 2 as it should:
{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///home/w/utf16.lean","version":56},"contentChanges":[{"range":{"start":{"line":0,"character":2},"end":{"line":0,"character":2}},"rangeLength":0,"text":"l"}]}}
Which Language Server did you use
Custom one added via the tutorial. lsp-mode
version 7.0.1.
OS
Linux