Skip to content

Incorrect handling of multibyte UTF-16 encodings #2080

Open
@Vtec234

Description

@Vtec234

Describe the bug
It seems the handling of multibyte UTF-16 encodings is incorrect in lsp-mode.

To Reproduce
Create a file with just these contents: "🍋" - a single lemon emoji. Place the cursor after the lemon and type a character ("l", say, for lemon). Most emojis including this one are represented by two UTF-16 bytes, so since LSP specifies offsets as in a UTF-16 string representation, this is at column 2.

But lsp-mode sends column 1:

{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///home/w/utf16.lean","version":1},"contentChanges":[{"range":{"start":{"line":0,"character":1},"end":{"line":0,"character":1}},"rangeLength":0,"text":"l"}]}}

Expected behavior
Compare with e.g. the VSCode sample client, which sends 2 as it should:

{"jsonrpc":"2.0","method":"textDocument/didChange","params":{"textDocument":{"uri":"file:///home/w/utf16.lean","version":56},"contentChanges":[{"range":{"start":{"line":0,"character":2},"end":{"line":0,"character":2}},"rangeLength":0,"text":"l"}]}}

Which Language Server did you use
Custom one added via the tutorial. lsp-mode version 7.0.1.

OS
Linux

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions