Support text-2.0 #392

Bodigrim · 2022-01-14T01:00:47Z

Closes #391.

This is a draft, need to handle fromJust more gracefully (and publish text-lines on Hackage).

I tried adapting rope-utf16-splay to text-2.0, but did not succeed: the changes required are too radical to its API. Hence a switch to text-lines.

jneira · 2022-01-14T06:45:59Z

Thanks for the pr. For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

michaelpj · 2022-01-14T14:20:32Z

Nice!

Will text-lines be compatible with text both pre- and post-2? That would be ideal, so we don't force a text-2 dependency if we can help it.

Bodigrim · 2022-01-15T01:07:06Z

For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

Could you point me to a specific benchmark please?

Will text-lines be compatible with text both pre- and post-2?

Of course, it is compatible with both already.

jneira · 2022-01-15T21:38:36Z

For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

Could you point me to a specific benchmark please?

Sure, it is the benchmark we use for ghcide: https://github.com/haskell/haskell-language-server/blob/9c2bc32875ae0bb6871d7cc06547da129cf4ff5f/ghcide/ghcide.cabal#L449 in our ci

Bodigrim · 2022-01-16T01:31:20Z

lsp-types/src/Language/LSP/VFS.hs

-    (before, after) = Rope.splitAt start str
-    after' = Rope.drop len after
+    (before, after) = fromJust $ Rope.splitAtPosition finish str
+    (before', _) = fromJust $ Rope.splitAtPosition start before


What's the best way to handle malformed input here? Since LSP counts by UTF-16 code units (not code points), it is possible that split happens in the middle of a code point, in which case text-lines returns Nothing.
(rope-utf16-splay in such situation silently extends split region until the end of a code point)

@alanz ? do you remember what's up here?

Nope. To be honest I have never really paid attention to issues of splits in the middle of code points, assuming the client will send meaningful input, since they are managing a cursor over the text and sending its positions.

We could throw. Maybe blowing up with an informative error message is the most sensible thing to do here.

I would be happy with either:

Throwing, based on judging this to be very unlikely.

Returning Nothing, and propagating that up to the exported functions. In changeFromClientVFS we can drop the change and log, as is done in the existing Nothing branch.

Thoughts @alanz ?

Alternatively, we can merge this with the fromJust and experiment with removing it with @Bodigrim out of the loop :)

Looking more at the code, I think we could push these failures right the way up to the LSP message response, which is probably the right thing to do, but I think we should do it separately. So my vote goes for "leave fromJust for now and remove subsequently", which I volunteer to look into.

I think this is unlikely to happen in practice, so leaving as is would likely be safe, as is pushing it up to an LSP response. For the latter it at least makes us aware it is happening, and hopefully that will show up in initial developer dogfooding, or not at all.

Bodigrim · 2022-01-16T01:32:10Z

lsp-types/src/Language/LSP/VFS.hs

-                             $ fst $ Rope.splitAtLine 1 $ snd $ Rope.splitAtLine (fromIntegral l) ropetext
-        let beforePos = T.take (fromIntegral c) curLine
+        let curRope = fst $ Rope.splitAtLine 1 $ snd $ Rope.splitAtLine (fromIntegral l) ropetext
+        beforePos <- Rope.toText . fst <$> Rope.splitAt (fromIntegral c) curRope


BTW this fixes an existing bug: T.take counts code points, while LSP requires counting UTF-16 code units.

Bodigrim · 2022-01-16T01:38:26Z

Benchmarks adapted from https://github.com/ollef/rope-bench/blob/main/bench/Bench.hs show that text-lines is at least twice faster than rope-utf16-splay for splitAt. Plus splitAtLine takes logarithmic time instead of linear.

All
  inputlen 1000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (1.35s)
          1.30 ms ± 120 μs
        TextLines: OK (1.29s)
          627  μs ±  54 μs
    random mods
      mods sized 10
        SplayTree: OK (3.03s)
          1.48 ms ±  97 μs
        TextLines: OK (2.60s)
          644  μs ±  52 μs
    toText . fromText
      SplayTree:   OK (3.56s)
        846  ns ±  37 ns
      TextLines:   OK (4.43s)
        133  ns ± 5.0 ns
  inputlen 10000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (18.24s)
          2.12 ms ± 211 μs
        TextLines: OK (2.35s)
          575  μs ±  30 μs
    random mods
      mods sized 10
        SplayTree: OK (2.37s)
          2.28 ms ± 218 μs
        TextLines: OK (1.22s)
          1.19 ms ± 114 μs
    toText . fromText
      SplayTree:   OK (2.64s)
        10.1 μs ± 512 ns
      TextLines:   OK (1.58s)
        374  ns ±  31 ns
  inputlen 100000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (2.31s)
          2.24 ms ± 119 μs
        TextLines: OK (5.38s)
          659  μs ±  60 μs
    random mods
      mods sized 10
        SplayTree: OK (2.45s)
          4.80 ms ± 383 μs
        TextLines: OK (1.89s)
          1.84 ms ± 114 μs
    toText . fromText
      SplayTree:   OK (3.56s)
        106  μs ± 5.5 μs
      TextLines:   OK (1.65s)
        3.12 μs ± 248 ns

michaelpj · 2022-01-17T19:49:01Z

@Bodigrim do you feel motivated to maintain text-lines indefinitely? Performance is great, but I am also wary of adding another thing to your list. Since the existing library got fixed we could just use that.

Bodigrim · 2022-01-17T21:47:49Z

I think I'm beyond the point of no return :) and will release and maintain it anyway.

More benchmarks, now with a realistic payload mimicking lsp:

text-1.2:
  TextLines: 
    2.15 ms ±  53 μs
  RopeSplay: 
    6.06 ms ± 181 μs

text-2.0:
  TextLines: 
    3.71 ms ± 125 μs
  RopeSplay: 
    11.3 ms ± 371 μs

ollef · 2022-01-18T13:21:22Z

Hard to argue with those performance numbers. 👍

michaelpj

LGTM, so long as we resolve the question of what to do with malformed input.

michaelpj · 2022-01-21T12:46:36Z

lsp-types/src/Language/LSP/VFS.hs

@@ -357,6 +348,7 @@ getCompletionPrefix pos@(J.Position l c) (VirtualFile _ _ ropetext) =
            let modParts = dropWhile (not . isUpper . T.head)
                                $ reverse $ filter (not .T.null) xs
                modName = T.intercalate "." modParts
+            curLine <- headMaybe $ T.lines $ Rope.toText curRope


Why not use Rope.lines? Won't that be faster?

curRope is already a single line, but it likely includes an enclosing \n. The only purpose of headMaybe . T.lines . Rope.toText is to strip this \n. I've changed it to a more explicit version.

michaelpj · 2022-01-21T12:49:17Z

lsp/test/VspSpec.hs

@@ -65,7 +64,7 @@ vspSpec = do
          ]
        new = applyChange (fromString orig)
                $ J.TextDocumentContentChangeEvent (Just $ J.mkRange 2 1 2 5) (Just 4) ""
-      lines (Rope.toString new) `shouldBe`
+      lines (T.unpack $ Rope.toText new) `shouldBe`


I think these could all be

Suggested change

lines (T.unpack $ Rope.toText new) `shouldBe`

Rope.lines new `shouldBe`

Excellent idea!

michaelpj · 2022-01-21T12:52:52Z

lsp-types/src/Language/LSP/VFS.hs

-    (before, after) = Rope.splitAt start str
-    after' = Rope.drop len after
+    (before, after) = fromJust $ Rope.splitAtPosition finish str
+    (before', _) = fromJust $ Rope.splitAtPosition start before


I would be happy with either:

Throwing, based on judging this to be very unlikely.

Returning Nothing, and propagating that up to the exported functions. In changeFromClientVFS we can drop the change and log, as is done in the existing Nothing branch.

Thoughts @alanz ?

pepeiborra mentioned this pull request Jan 14, 2022

Migration to text 2.0 #391

Closed

Bodigrim commented Jan 16, 2022

View reviewed changes

Bodigrim marked this pull request as ready for review January 19, 2022 22:57

michaelpj approved these changes Jan 21, 2022

View reviewed changes

Bodigrim and others added 2 commits January 22, 2022 00:24

Switch from rope-utf16-splay to text-rope

65c83ed

Merge branch 'master' into master

82bc411

michaelpj merged commit 28232e6 into haskell:master Jan 22, 2022

Bodigrim mentioned this pull request Jan 28, 2022

Unicode symbols >= 𐀀 are broken haskell/haskell-language-server#2646

Open

	lines (T.unpack $ Rope.toText new) `shouldBe`
	Rope.lines new `shouldBe`

Support text-2.0 #392

Support text-2.0 #392

Uh oh!

Conversation

Bodigrim commented Jan 14, 2022

Uh oh!

jneira commented Jan 14, 2022

Uh oh!

michaelpj commented Jan 14, 2022

Uh oh!

Bodigrim commented Jan 15, 2022

Uh oh!

jneira commented Jan 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bodigrim commented Jan 16, 2022

Uh oh!

michaelpj commented Jan 17, 2022

Uh oh!

Bodigrim commented Jan 17, 2022

Uh oh!

ollef commented Jan 18, 2022

Uh oh!

michaelpj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!