Skip to content

Update ctags regexes #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

dreamtigers
Copy link

Hello.
I'm currently working on the Elixir parser for universal-ctags, and took your project there as the base for my changes. I made the regexes more exhaustive (but not completely exhaustive), and wanted to share my changes to upstream.

In the commit messages you'll find detailed information about each change.
Feel free to ask or discuss any change.

Reason: that's the way Exhuberant Ctags defines it's kinds.  See
http://ctags.sourceforge.net/EXTENDING.html
* Records can be private (defrecord*p*)
* Both Records and test definitions can use parenthesis instead of
spaces.
`defcallback/1` is deprecated, Elixir now uses the `@callback` module
attribute to provide the spec for a behaviour callback.
Macros do not necessarily have to be followed by an `(` parenthesis, but
they *do* necessarily have to *not* be followed by any of the characters
inside the backticks: `|^/&<>~.=!*+-`. Because if they do, then they're
no longer normal macros, but _operators_.

From https://hexdocs.pm/elixir/operators.html

> To define an operator, you can use the usual def* constructs (def,
> defp, defmacro, and so on) but with a syntax similar to how the
> operator is used

So, what applied to `defmacro` applies equally to `def`, that's why we
added the bracket expression to the function regex.
from https://hexdocs.pm/elixir/operators.html:

> Elixir is capable of parsing a predefined set of operators; this means
> that it's not possible to define new operators (like one could do in
> Haskell, for example). However, not all operators that Elixir can
> parse are *used* by Elixir: for example, `+` and `||` are used by
> Elixir for addition and boolean *or*, but `<~>` is not used (but
> valid).

The following is a list of all the operators that Elixir is capable of
parsing, but that are not used by default (separated by a comma):

|, |||, &&&, <<<, >>>, <<~, ~>>, <~, ~>, <~>, <|>, ^^^, ~~~

They could be expressed in a regex atom `(&&&|<~|<~>|etc...)` but I
think it can be reduced (I don't know the difference in efficiency) to:

`[\|\^&<>~]{1,3}`

Adding to that bracket expression the rest of the characters and words
used for the defined operators (which can be overriden):

`([\|\^\/&<>~.=!*+-]{1,3}|and|or|not|in|not in)`

But since the regex engine used in exhuberant-ctags can't do lookahead,
I'm not gonna bother to try and make the operator regex exhaustive...
plus it's also bad practice to override the default operators.
@masatake
Copy link

masatake commented Mar 3, 2019

@dreamtigers, thank you for working in this area.
I want to write some comments as an expert in Universal-ctags.

If you(we) want to use this .ctags with Exuberant-ctags, you cannot use '#' as a comment starter.
The notation is introduced in Universal-ctags.

If you want to use this .ctags only with Universal-ctags, I strongly recommend you to use

--kinddef-Elixir=n,name,description

option. See https://github.com/universal-ctags/ctags/blob/master/man/ctags-optlib.7.rst.in.
The change 2a98125 doesn't fit to --kinddef-<LANG> option. It implies you try to capture different language objects with one kind. I recommend you to use different kinds for different language objects.

@dreamtigers
Copy link
Author

dreamtigers commented Mar 3, 2019 via email

masatake pushed a commit to masatake/ctags that referenced this pull request Mar 7, 2019
I already made a pull request with *detailed commits* to the upstream of
the elixir.ctags parser. Said pull request and commits can be found in
mmorearty/elixir-ctags#8

This parser is missing basically two things which are annotated in the
input.ex test with a TODO keyword, plus an explanation of _why_ they're
missing.  Said missing features are listed bellow:

* exceptions: I don't know how to identificate or differenciate
exceptions from one another because they do not have a name. Instead,
they resemble an struct. For example, if I have:

```
defmodule MyAppError1 do
  defexception [:message]
  # code
end

defmodule MyAppError2 do
  defexception [:message]
  # code
end
```

How would/should ctags differenciate between these two exceptions?

* word-defined logical operators (`and, or, not`): The elixir parser in
this commit already has the regex atom including these logical
operators, but since the regex engine used by universal-ctags does not
have lookahead, I couldn't think of a way to add them or negate them.

The test for them is commented in, for the time someone is brave enough
to add them.
jparise added a commit to jparise/dotfiles that referenced this pull request Mar 24, 2019
This is a more exhaustive set of ctags patterns borrowed from
from mmorearty/elixir-ctags#8.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants