Update ctags regexes #8

dreamtigers · 2019-03-03T13:18:45Z

Hello.
I'm currently working on the Elixir parser for universal-ctags, and took your project there as the base for my changes. I made the regexes more exhaustive (but not completely exhaustive), and wanted to share my changes to upstream.

In the commit messages you'll find detailed information about each change.
Feel free to ask or discuss any change.

Reason: that's the way Exhuberant Ctags defines it's kinds. See http://ctags.sourceforge.net/EXTENDING.html

* Records can be private (defrecord*p*) * Both Records and test definitions can use parenthesis instead of spaces.

`defcallback/1` is deprecated, Elixir now uses the `@callback` module attribute to provide the spec for a behaviour callback.

Macros do not necessarily have to be followed by an `(` parenthesis, but they *do* necessarily have to *not* be followed by any of the characters inside the backticks: `|^/&<>~.=!*+-`. Because if they do, then they're no longer normal macros, but _operators_. From https://hexdocs.pm/elixir/operators.html > To define an operator, you can use the usual def* constructs (def, > defp, defmacro, and so on) but with a syntax similar to how the > operator is used So, what applied to `defmacro` applies equally to `def`, that's why we added the bracket expression to the function regex.

from https://hexdocs.pm/elixir/operators.html: > Elixir is capable of parsing a predefined set of operators; this means > that it's not possible to define new operators (like one could do in > Haskell, for example). However, not all operators that Elixir can > parse are *used* by Elixir: for example, `+` and `||` are used by > Elixir for addition and boolean *or*, but `<~>` is not used (but > valid). The following is a list of all the operators that Elixir is capable of parsing, but that are not used by default (separated by a comma): |, |||, &&&, <<<, >>>, <<~, ~>>, <~, ~>, <~>, <|>, ^^^, ~~~ They could be expressed in a regex atom `(&&&|<~|<~>|etc...)` but I think it can be reduced (I don't know the difference in efficiency) to: `[\|\^&<>~]{1,3}` Adding to that bracket expression the rest of the characters and words used for the defined operators (which can be overriden): `([\|\^\/&<>~.=!*+-]{1,3}|and|or|not|in|not in)` But since the regex engine used in exhuberant-ctags can't do lookahead, I'm not gonna bother to try and make the operator regex exhaustive... plus it's also bad practice to override the default operators.

masatake · 2019-03-03T13:37:11Z

@dreamtigers, thank you for working in this area.
I want to write some comments as an expert in Universal-ctags.

If you(we) want to use this .ctags with Exuberant-ctags, you cannot use '#' as a comment starter.
The notation is introduced in Universal-ctags.

If you want to use this .ctags only with Universal-ctags, I strongly recommend you to use

--kinddef-Elixir=n,name,description

option. See https://github.com/universal-ctags/ctags/blob/master/man/ctags-optlib.7.rst.in.
The change 2a98125 doesn't fit to --kinddef-<LANG> option. It implies you try to capture different language objects with one kind. I recommend you to use different kinds for different language objects.

dreamtigers · 2019-03-03T15:14:58Z

Thank you for your time. I hadn't read this before doing the pull request in the universal-ctags repo. I will make the necessary changes in both repos and submit again. Thanks for the review.

…

On Sun, Mar 3, 2019, 9:37 AM Masatake YAMATO ***@***.***> wrote: @dreamtigers <https://github.com/dreamtigers>, thank you for working in this area. I want to write some comments as an expert in Universal-ctags. If you(we) want to use this .ctags with Exuberant-ctags, you cannot use '#' as a comment starter. The notation is introduced in Universal-ctags. If you want to use this .ctags only with Universal-ctags, I strongly recommend you to use --kinddef-Elixir=n,name,description option. See https://github.com/universal-ctags/ctags/blob/master/man/ctags-optlib.7.rst.in . The change 2a98125 <2a98125> doesn't fit to --kinddef-<LANG> option. It implies you try to capture different language objects with one kind. I recommend you to use different kinds for different language objects. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ0eljr3Kel48qa0Aedwp09ZFKA1yE31ks5vS9AHgaJpZM4ba_Dv> .

I already made a pull request with *detailed commits* to the upstream of the elixir.ctags parser. Said pull request and commits can be found in mmorearty/elixir-ctags#8 This parser is missing basically two things which are annotated in the input.ex test with a TODO keyword, plus an explanation of _why_ they're missing. Said missing features are listed bellow: * exceptions: I don't know how to identificate or differenciate exceptions from one another because they do not have a name. Instead, they resemble an struct. For example, if I have: ``` defmodule MyAppError1 do defexception [:message] # code end defmodule MyAppError2 do defexception [:message] # code end ``` How would/should ctags differenciate between these two exceptions? * word-defined logical operators (`and, or, not`): The elixir parser in this commit already has the regex atom including these logical operators, but since the regex engine used by universal-ctags does not have lookahead, I couldn't think of a way to add them or negate them. The test for them is commented in, for the time someone is brave enough to add them.

This is a more exhaustive set of ctags patterns borrowed from from mmorearty/elixir-ctags#8.

dreamtigers added 6 commits March 3, 2019 07:59

Change kind names to singular form

bc3b25f

Reason: that's the way Exhuberant Ctags defines it's kinds. See http://ctags.sourceforge.net/EXTENDING.html

Update Record.defrecord and test regexes

5bd9c3b

* Records can be private (defrecord*p*) * Both Records and test definitions can use parenthesis instead of spaces.

Update callback regex to use the new callback

17e690f

`defcallback/1` is deprecated, Elixir now uses the `@callback` module attribute to provide the spec for a behaviour callback.

Add regex for guard definition

b6260af

dreamtigers mentioned this pull request Mar 3, 2019

Improved Elixir optlib parser universal-ctags/ctags#2024

Closed

Reduce the regex of operators kind to just one regex

3637529

jparise added a commit to jparise/dotfiles that referenced this pull request Mar 24, 2019

ctags: improved elixir patterns

0d1cf53

This is a more exhaustive set of ctags patterns borrowed from from mmorearty/elixir-ctags#8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ctags regexes #8

Update ctags regexes #8

Uh oh!

dreamtigers commented Mar 3, 2019

Uh oh!

masatake commented Mar 3, 2019

Uh oh!

dreamtigers commented Mar 3, 2019 via email

Uh oh!

Uh oh!

Update ctags regexes #8

Are you sure you want to change the base?

Update ctags regexes #8

Uh oh!

Conversation

dreamtigers commented Mar 3, 2019

Uh oh!

masatake commented Mar 3, 2019

Uh oh!

dreamtigers commented Mar 3, 2019 via email

Uh oh!

Uh oh!