Skip to content

Calculate wrong end position after kill a line #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kongds opened this issue Sep 14, 2017 · 18 comments
Closed

Calculate wrong end position after kill a line #114

kongds opened this issue Sep 14, 2017 · 18 comments
Assignees

Comments

@kongds
Copy link

kongds commented Sep 14, 2017

The function lsp--point-to-position just go to point and calculate pos of this point.
But when go to end point after kill a line, this point is the new pos to old end point.
For example one buffer has the three lines like this

this is the line will be killed
this is second line
this is third line

use kill-line at point 0
so the start pos is (0,0)
but the end pos will be (2,7) should be (0, length of the first line)

@astahlman
Copy link
Contributor

I've run into this problem, as well. Unfortunately, I think it's actually not possible for Emacs to correctly implement the protocol - at least not using text change hooks. I'm reposting some analysis from microsoft/language-server-protocol#9 (comment) inline below.

Consider the case of a single-line deletion. Here are the buffer contents at time 1.

1234
6789

And here are the buffer contents at time 2, after line 1 is deleted. (Note that a newline character is at position 5).

6789

From the Emacs documentation on Change Hooks:

Three arguments are passed to each function: the positions of
the beginning and end of the range of changed text,
and the length in chars of the pre-change text replaced by that range.
(For an insertion, the pre-change length is zero;
for a deletion, that length is the number of chars deleted,
and the post-change beginning and end are at the same place.)

So after the line deletion, our after-change-function is invoked with the following arguments:

start = 1
end = 1
length = 5

The desired payload of the TextDocumentContentChangeEvent for this edit looks like this:

{
    'range': {
        'start': {'line': 0, 'character': 0},
        'end': {'line': 1, 'character': 0}
    },
  'rangeLength': 5,
  'text': ''
}

range.start can be calculated given the value of start and the current buffer contents. rangeLength is equal to the value of length. The problem is in calculating range.end; there is no way to reconstruct the pre-edit state of the buffer given the information available.

As proof, instead of

1234
6789

consider the following, alternate, prior state of the buffer:

1
34
6789

Deleting the first 2 lines would produce the same buffer state as before:

6789

And the arguments to our after-change-function would be the same:

start: 1
end: 1
length: 5

But the correct payload for the TextDocumentContentChangeEvent corresponding to this change is:

{
    'range': {
        'start': {'line': 0, 'character': 0},
        'end': {'line': 2, 'character': 0}
    },
  'rangeLength': 5,
  'text': ''
}

(Note the difference in the range.end field - line: 2 instead of line: 1.)

Thus, the best we can do using Emacs Text Hooks is to correctly specify the range.start and rangeLength and populate range.end with some garbage value.

Given a range.start, knowledge of either rangeLength or range.end should be sufficient for the server to update its representation of the document. As a proof of concept, I have a local branch of this repository which populates a garbage end position for the TextDocumentContentChangeEvent, like this:

{
    'range': {
        'start': {'line': 0, 'character': 0},
        'end': {'line': -1, 'character': -1}
    },
  'rangeLength': 5,
  'text': ''
}

And I've implemented logic in a local branch of https://github.com/palantir/python-language-server to calculate the end position based on range.start and the rangeLength if range.end is invalid. This works very well and addresses the problem raised in this issue.

So my question to the maintainers is: would you be interested in a patch in this repository to set an invalid end position? The obvious drawback is that each language server implementation will need to implement the logic to calculate the range.end if it's missing or invalid. However, I think making this failure mode explicit is preferable to the current state, in which the server's representation of the document silently diverges from the true state in the editor and produces bizarre behavior.

Also, please do chime in on microsoft/language-server-protocol#9 and advocate for amending the protocol to formalize this behavior by allowing clients like Emacs to omit either range.end or rangeLength in the TextDocumentContentChangeEvent.

@vibhavp
Copy link
Member

vibhavp commented Sep 25, 2017

Thanks for explaining this. IMHO, setting range.end to be a garbage value would add an edge case to only Emacs which will be hard for every language server to implement. I have thought of 2 options we can use to go around this:

  • Disable incremental changes entirely for now, until we figure out a viable solution; or

  • Store a copy of the buffer's old contents (using (buffer-string)), and make our after-change-function construct range.end from this string, updating it with the latest contents afterwards. Doing this on every change might eat up memory and CPU time.

One possible option might also be to add a new hook to Emacs itself, which sends the line and column positions instead of the numerical points for start and end. This would require changes to Emacs' codebase itself, and I'm not sure if doing this for a single package is worth it (and whether the maintainers would allow such a change in the first place).

(@alanz, thoughts?)

@alanz
Copy link
Contributor

alanz commented Sep 25, 2017

This is a bit extreme

Disable incremental changes entirely for now, until we figure out a viable solution

The current implementation sends a change message per after-change event, which is usable, but can lead to extra work for a naive server, if it tries to process every single change.

The second option of calculating a diff in emacs would be unworkable in practice, I suspect, since this is a very high frequency event.

Given that emacs is one of the major clients, I think it would be worth asking the protocol to support a change option that matches what is in emacs, so that we end up with the natural behaviour.

Alternatively we should discuss server behaviour around this, in the sense of buffering change requests there before launching an expensive diagnostic process. This is essentially identical to accumulating the changes in emacs, but the accumulation is in the server.

@astahlman
Copy link
Contributor

IMHO, setting range.end to be a garbage value would add an edge case to only Emacs which will be hard for every language server to implement.

Agreed that this definitely is more work for the server, so unless the protocol is amended this would be very fragile. But FWIW, it's relatively straightforward to implement this (unless of course my implementation is bugged, which is entirely possible). Here's how I've done it in the python-language-server:

    def _calculate_end_position(self, start_line, start_col, range_length):
        '''Calculate line and column-based position for the end of range'''
        col_offset = start_col
        i = start_line

        def range_spans_this_line(i, range_length, col_offset):
            return i < len(self.lines) and \
                range_length > len(self.lines[i]) - col_offset

        while range_spans_this_line(i, range_length, col_offset):
            range_length -= len(self.lines[i]) - col_offset
            col_offset = 0
            i += 1

        assert i < len(self.lines), "End of range exceeds end of document - no more lines!"
        assert range_length <= len(self.lines[i]) - col_offset, "End of range exceeds end of document - no more columns!"

        # the range ends on this line, just move the offset
        end_line = i
        end_col = col_offset + range_length

        return end_line, end_col

I agree with @alanz's assessment - the best option would be to formalize this in the protocol by allowing clients to set exactly one of range.end or rangeLength, since either is redundant given the other. Shall we open an issue for this in https://github.com/Microsoft/language-server-protocol? So far all the discussion has piggybacked off of microsoft/language-server-protocol#9

In the meantime, capturing the state of the buffer during the change hook is a clever idea. Perhaps we should prototype it and see if it adds any noticeable latency?

@kongds
Copy link
Author

kongds commented Sep 27, 2017

Can we look up buffer-undo-list when something was deleted?
For example

(defun delete-change (start end length)
  (if (and (> length 0) (listp buffer-undo-list))
      ;;lenght > 0 is deleting something
      ;;So the first entry of buffer-undo-list is (TEXT . END)
      ;;We can count how many \n are deleted to caculate end-point :line
      (let ((d (split-string  (substring-no-properties
                               (car (car buffer-undo-list))) "\n"))
            (start-p (lsp-point-to-position start))
            (end-p '(:line 0 :character 0)))
        (plist-put end-p :line (+ (plist-get start-p :line) (length d) -1))
        (plist-put end-p :character (if (> (length d) 1)
                                        (length (car (last d)))
                                      (+ (length (car (last d))) (plist-get start-p :character))))
        (message (prin1-to-string d))
        (message (prin1-to-string start-p))
        (message (prin1-to-string end-p)))))

@astahlman
Copy link
Contributor

Hmm, that's an interesting idea. I may try out using buffer-undo-list in the next couple of days and report back.

@astahlman
Copy link
Contributor

Here's a proof-of-concept for the idea suggested by @kongds. Seems to work pretty well - the downside is that it's incompatible with undo-tree-mode.

@alanz alanz self-assigned this Oct 3, 2017
@alanz
Copy link
Contributor

alanz commented Oct 3, 2017

@astahlman Now that I am actually looking at this, I notice that in the haskell-lsp implementation we already ignore the end position, and work from the range only.

So I am in favour of making the end position optional when deleting.

@alanz
Copy link
Contributor

alanz commented Oct 4, 2017

I am considering using the before-change-functions hook (See https://www.gnu.org/software/emacs/manual/html_node/elisp/Change-Hooks.html#Change-Hooks) as well as the existing after-change-functions hook.

The idea is to store the values (and converted positions as line/col) from the before change call, and possibly use it in the after change section.

This is what I am seeing between in the calls

  ;; Adding text:
  ;;   lsp-before-change:(start,end   )=(33,33)
  ;;   lsp-on-change:(start,end,length)=(33,34,0)
  ;;
  ;; Deleting text:
  ;;   lsp-before-change:(start,end)=   (19,27)
  ;;   lsp-on-change:(start,end,length)=(19,19,8)

The elisp manual warns that you cannot expect these to be neatly bracketed.

So I am thinking that we only need to care for the deleting case, and we can do a sanity check that the two start values are the same, and that the end - start from the before call is the same as the length in the after call.

If they do not match we can potentially send the change through as the whole buffer, i.e. a full sync for that particular change.

@astahlman comments?

@alanz
Copy link
Contributor

alanz commented Oct 4, 2017

My WIP is here: c989b19

@astahlman
Copy link
Contributor

@alanz That's an interesting idea. How would it work in the case of substitutions?

The warnings you alluded to are concerning, especially this part:

These hooks are provided on the assumption that Lisp programs will use either before- or the after-change hooks, but not both, and the boundaries of the region where the changes happen might include more than just the actual changed text, or even lump together several changes done piecemeal.

Given that we have no guarantees about when these hooks are invoked, I'm not sure whether the fact that before.end - before.start == after.length is sufficient to guarantee that we've captured a single delete change.

But I suspect that, in practice, the probability of receiving a non-bracketed change where the start positions and range lengths perfectly align would be low enough that this approach may be "good enough". It would be nice if there were support at the protocol level for the client to periodically ask the server, "What's the SHA-1 of your representation of document X?" so that we could periodically verify that the two representations of the document haven't diverged.

@vibhavp
Copy link
Member

vibhavp commented Oct 5, 2017

But I suspect that, in practice, the probability of receiving a non-bracketed change where the start positions and range lengths perfectly align would be low enough that this approach may be "good enough".

We would still need a fallback method for when two calls to before and after change hooks aren't balanced.

@alanz
Copy link
Contributor

alanz commented Oct 5, 2017

My proposed fallback is to send a change being the entire buffer. I.e. a full sync.

I just need to find time to get back to this, things are running away right now.

@alanz
Copy link
Contributor

alanz commented Oct 6, 2017

I have updated my branch, seems to work

@alanz
Copy link
Contributor

alanz commented Oct 6, 2017

See #124

@alanz
Copy link
Contributor

alanz commented Oct 6, 2017

It might make sense to make this check optional, if you know you are using a server that does not use the end position. Like haskell-ide-engine.

@alanz
Copy link
Contributor

alanz commented Oct 11, 2017

I think this issue can be closed?

@vibhavp
Copy link
Member

vibhavp commented Oct 12, 2017

Yep, thanks!

@vibhavp vibhavp closed this as completed Oct 12, 2017
wkirschbaum pushed a commit to wkirschbaum/lsp-mode that referenced this issue Jun 1, 2021
While this lowers compatability somewhat, any system that a developer is
actively using is expected to have bash installed. This change is required
because asdf does not currently support sh, only bash and other more advanced
shells.

Fixes emacs-lsp#114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants