Skip to content

Support text-2.0 #392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 22, 2022
Merged

Support text-2.0 #392

merged 2 commits into from
Jan 22, 2022

Conversation

Bodigrim
Copy link
Contributor

Closes #391.

This is a draft, need to handle fromJust more gracefully (and publish text-lines on Hackage).

I tried adapting rope-utf16-splay to text-2.0, but did not succeed: the changes required are too radical to its API. Hence a switch to text-lines.

@jneira
Copy link
Member

jneira commented Jan 14, 2022

Thanks for the pr. For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

@pepeiborra pepeiborra mentioned this pull request Jan 14, 2022
@michaelpj
Copy link
Collaborator

Nice!

Will text-lines be compatible with text both pre- and post-2? That would be ideal, so we don't force a text-2 dependency if we can help it.

@Bodigrim
Copy link
Contributor Author

For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

Could you point me to a specific benchmark please?

Will text-lines be compatible with text both pre- and post-2?

Of course, it is compatible with both already.

@jneira
Copy link
Member

jneira commented Jan 15, 2022

For things like this i miss some sort of benchmark here. We could use the hls benchmark making ghcide use this pr though.

Could you point me to a specific benchmark please?

Sure, it is the benchmark we use for ghcide: https://github.com/haskell/haskell-language-server/blob/9c2bc32875ae0bb6871d7cc06547da129cf4ff5f/ghcide/ghcide.cabal#L449 in our ci

(before, after) = Rope.splitAt start str
after' = Rope.drop len after
(before, after) = fromJust $ Rope.splitAtPosition finish str
(before', _) = fromJust $ Rope.splitAtPosition start before
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the best way to handle malformed input here? Since LSP counts by UTF-16 code units (not code points), it is possible that split happens in the middle of a code point, in which case text-lines returns Nothing.
(rope-utf16-splay in such situation silently extends split region until the end of a code point)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanz ? do you remember what's up here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. To be honest I have never really paid attention to issues of splits in the middle of code points, assuming the client will send meaningful input, since they are managing a cursor over the text and sending its positions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could throw. Maybe blowing up with an informative error message is the most sensible thing to do here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy with either:

  • Throwing, based on judging this to be very unlikely.
  • Returning Nothing, and propagating that up to the exported functions. In changeFromClientVFS we can drop the change and log, as is done in the existing Nothing branch.

Thoughts @alanz ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can merge this with the fromJust and experiment with removing it with @Bodigrim out of the loop :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more at the code, I think we could push these failures right the way up to the LSP message response, which is probably the right thing to do, but I think we should do it separately. So my vote goes for "leave fromJust for now and remove subsequently", which I volunteer to look into.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unlikely to happen in practice, so leaving as is would likely be safe, as is pushing it up to an LSP response. For the latter it at least makes us aware it is happening, and hopefully that will show up in initial developer dogfooding, or not at all.

$ fst $ Rope.splitAtLine 1 $ snd $ Rope.splitAtLine (fromIntegral l) ropetext
let beforePos = T.take (fromIntegral c) curLine
let curRope = fst $ Rope.splitAtLine 1 $ snd $ Rope.splitAtLine (fromIntegral l) ropetext
beforePos <- Rope.toText . fst <$> Rope.splitAt (fromIntegral c) curRope
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this fixes an existing bug: T.take counts code points, while LSP requires counting UTF-16 code units.

@Bodigrim
Copy link
Contributor Author

Benchmarks adapted from https://github.com/ollef/rope-bench/blob/main/bench/Bench.hs show that text-lines is at least twice faster than rope-utf16-splay for splitAt. Plus splitAtLine takes logarithmic time instead of linear.

All
  inputlen 1000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (1.35s)
          1.30 ms ± 120 μs
        TextLines: OK (1.29s)
          627  μs ±  54 μs
    random mods
      mods sized 10
        SplayTree: OK (3.03s)
          1.48 ms ±  97 μs
        TextLines: OK (2.60s)
          644  μs ±  52 μs
    toText . fromText
      SplayTree:   OK (3.56s)
        846  ns ±  37 ns
      TextLines:   OK (4.43s)
        133  ns ± 5.0 ns
  inputlen 10000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (18.24s)
          2.12 ms ± 211 μs
        TextLines: OK (2.35s)
          575  μs ±  30 μs
    random mods
      mods sized 10
        SplayTree: OK (2.37s)
          2.28 ms ± 218 μs
        TextLines: OK (1.22s)
          1.19 ms ± 114 μs
    toText . fromText
      SplayTree:   OK (2.64s)
        10.1 μs ± 512 ns
      TextLines:   OK (1.58s)
        374  ns ±  31 ns
  inputlen 100000, 1000 iterations
    linear mods
      mods sized 10
        SplayTree: OK (2.31s)
          2.24 ms ± 119 μs
        TextLines: OK (5.38s)
          659  μs ±  60 μs
    random mods
      mods sized 10
        SplayTree: OK (2.45s)
          4.80 ms ± 383 μs
        TextLines: OK (1.89s)
          1.84 ms ± 114 μs
    toText . fromText
      SplayTree:   OK (3.56s)
        106  μs ± 5.5 μs
      TextLines:   OK (1.65s)
        3.12 μs ± 248 ns

@michaelpj
Copy link
Collaborator

@Bodigrim do you feel motivated to maintain text-lines indefinitely? Performance is great, but I am also wary of adding another thing to your list. Since the existing library got fixed we could just use that.

@Bodigrim
Copy link
Contributor Author

I think I'm beyond the point of no return :) and will release and maintain it anyway.

More benchmarks, now with a realistic payload mimicking lsp:

text-1.2:
  TextLines: 
    2.15 ms ±  53 μs
  RopeSplay: 
    6.06 ms ± 181 μs

text-2.0:
  TextLines: 
    3.71 ms ± 125 μs
  RopeSplay: 
    11.3 ms ± 371 μs

@ollef
Copy link
Contributor

ollef commented Jan 18, 2022

Hard to argue with those performance numbers. 👍

@Bodigrim Bodigrim marked this pull request as ready for review January 19, 2022 22:57
Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, so long as we resolve the question of what to do with malformed input.

@@ -357,6 +348,7 @@ getCompletionPrefix pos@(J.Position l c) (VirtualFile _ _ ropetext) =
let modParts = dropWhile (not . isUpper . T.head)
$ reverse $ filter (not .T.null) xs
modName = T.intercalate "." modParts
curLine <- headMaybe $ T.lines $ Rope.toText curRope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Rope.lines? Won't that be faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curRope is already a single line, but it likely includes an enclosing \n. The only purpose of headMaybe . T.lines . Rope.toText is to strip this \n. I've changed it to a more explicit version.

@@ -65,7 +64,7 @@ vspSpec = do
]
new = applyChange (fromString orig)
$ J.TextDocumentContentChangeEvent (Just $ J.mkRange 2 1 2 5) (Just 4) ""
lines (Rope.toString new) `shouldBe`
lines (T.unpack $ Rope.toText new) `shouldBe`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these could all be

Suggested change
lines (T.unpack $ Rope.toText new) `shouldBe`
Rope.lines new `shouldBe`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent idea!

(before, after) = Rope.splitAt start str
after' = Rope.drop len after
(before, after) = fromJust $ Rope.splitAtPosition finish str
(before', _) = fromJust $ Rope.splitAtPosition start before
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy with either:

  • Throwing, based on judging this to be very unlikely.
  • Returning Nothing, and propagating that up to the exported functions. In changeFromClientVFS we can drop the change and log, as is done in the existing Nothing branch.

Thoughts @alanz ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migration to text 2.0
5 participants