Skip to content

Disable GL01 check in numpydoc by default. #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
r-build opened this issue Nov 2, 2019 · 13 comments
Open

Disable GL01 check in numpydoc by default. #241

r-build opened this issue Nov 2, 2019 · 13 comments

Comments

@r-build
Copy link

r-build commented Nov 2, 2019

Could the GL01 check in numpydoc be dsiabled by default, please?

It contradicts the examples at the numpydoc docstring guide, and also PEP-257
where it says "The summary line may be on the same line as the opening quotes or on the next line.".

The code seems to be in numpydoc/validate.py :
"GL01": "Docstring text (summary) should start in the line immediately " "after the opening quotes (not in the same line, or leaving a " "blank line in between)",
and
if doc.start_blank_lines != 1 and "\n" in doc.raw_doc: errs.append(error("GL01"))

@rth
Copy link
Contributor

rth commented Nov 3, 2019

Thanks for investigating this @r-build !

@larsoner
Copy link
Collaborator

larsoner commented Nov 3, 2019

@datapythonista can you weigh in?

FWIW I also disabled this when I migrated the MNE-Python checks.

@datapythonista
Copy link
Contributor

Can't look at the links now, but if the standard is having the text in the same line of the quotes, my preference is to make the rule validate that this is the case.

We'll have to change hundreds of them in pandas, but better in the long term than doing tricky things.

For disabling things, I wouldn't have that in validate. I think it should report everything we implement, and the function that calls that should ignore whatever you don't want to validate. I think this will make the code much simpler.

@jorisvandenbossche
Copy link
Contributor

To be correct, IMO we didn't make a "wrong" choice in pandas, we just made "a" choice, since the standards are (explicitly) ambiguous. PEP257 says: "The summary line may be on the same line as the opening quotes or on the next line", and the numpydoc standard does not say it explicitly, but has conflicting examples (the main example explaining that section has it on the same line, the example for a class docstring on the next line).

My preference would be that we don't need to change it all again in pandas. But that probably means leaving this as an option for the users (using projects), which will complicate the check in numpydoc?

@rth
Copy link
Contributor

rth commented Nov 4, 2019

Indeed, looks like both choices are possible according to the PEP257. This was initially raised because this rule was producing a large number of cosmetic changes in scikit-learn (scikit-learn/scikit-learn#15444 (comment)), but I understand that changing it the other way around in pandas would be annoying as well.

Maybe we could for instance,

  • either disable it or enforce the the other choice without the newline in the numpydoc validator (so that it is consistent with the numpydoc examples)
  • have some private way to keep the current behavior for pandas, so you don't have to change it all over again.

cc @jnothman

@jnothman
Copy link
Member

jnothman commented Nov 4, 2019 via email

@larsoner
Copy link
Collaborator

larsoner commented Nov 4, 2019

For unambiguous checks, I agree with @datapythonista that "I want to ignore this" can and should be implemented by the end user:

For disabling things, I wouldn't have that in validate. I think it should report everything we implement, and the function that calls that should ignore whatever you don't want to validate. I think this will make the code much simpler.

But for ambiguous checks where there are variants, it might make sense to add complications to the validator function to say which standard you want to follow. So for example we might want GL01='new' | 'same'. In MNE and sklearn we would do 'same' and in pandas you would do 'new'. And if you really don't care either way you cull the GL01 ones from the list.

Currently in MNE we are just ignoring it, but generally adhere to the "same" policy, so it would be great if we could remove it from our ignore list and pass GL01='same' to the call to get stricter about it.

The only other option I can think of that would work would be to allow conflicting standards, e.g. a GL00 that says "Title must start on the same line as triple quotes" along side the GL01 saying it must start on the following line -- then sklearn/MNE can ignore GL01 and pandas ignores GL00. But I don't like that quite as much as it forces a user to always have to ignore something (no docstring can ever pass without some explicit ignore).

@larsoner
Copy link
Collaborator

larsoner commented Nov 4, 2019

... actually thinking about it more and looking around a bit, I'm thinking the conflicting GL00/GL01 (or whatever numbers) isn't such a bad option. For example it exists in flake8:

W503 line break before binary operator
W504 line break after binary operator

In numpydoc end users can then decide (by culling lists or via --ignore option in the command line) which, if either, they want to adhere to.

In the command-line case we can have a default --ignore=GL00 or something that makes it so that at least some docstring will pass all checks. flake8 for example has a default ignore list, too.

@datapythonista
Copy link
Contributor

I think we can probably agree in one of the conventions for all projects. I guess there are just historical reasons to have two conventions, and I think in the long run we'll benefit from just having one. The convention docs will be simpler, and people contributing in several projects will save some neurons on not having to remember which format is used in each project.

In pandas we use:

def foo():
    """
    Not in the line of the quotes.
    """

That was after some discussion on what people find more readable I think. And I'd say we also wanted to have 3 extra characters for the one liner summary. Is there any reason for starting with the quotes, besides being the convention currently followed?

My preference is the pandas convention, since it was our preference and we spent significant resources changing those. But if everybody else prefer the opposite, I prefer to have one convention and change again the pandas docstrings.

@r-build
Copy link
Author

r-build commented Nov 4, 2019

Why not allow both options, as specified deliberately by PEP-257?

(and the google python style guide)

To me, restricting to one or the other seems like a choice that doesn't need to be made.

Regarding triple quotes for one-liners, PEP 257 says "Triple quotes are used even though the string fits on one line. This makes it easy to later expand it."

@larsoner
Copy link
Collaborator

larsoner commented Nov 4, 2019

I think we can probably agree in one of the conventions for all projects. I guess there are just historical reasons to have two conventions, and I think in the long run we'll benefit from just having one. The convention docs will be simpler, and people contributing in several projects will save some neurons on not having to remember which format is used in each project.

Depends probably on who you mean by "we" here. I'm happy to speak for one or two projects, but even choosing one for e.g. SciPy would be a pain, let alone trying to dictate it for multiple projects. For SciPy, being able to check "it fits GL00 or GL01" is better than only "it fits GL01" or "it fits GL02" because it would be a pain to transition everything, and might not ever actually get done.

I agree one standard would be nice, but I think it's going to be very difficult to unify, and get people to transform their code. Different repos already implement different flavors and strictness via flake8 and pydocstyle, etc. so it's not too much to add one more requirement. If we only support one of the two options, large repos that don't follow whichever we choose are likely to just ignore the option rather than transition.

To me the simplest thing is just to implement both, push one (by enabling it by default and disabling the other by default in the command line thing), but allow people to do either.

If others disagree and think we should force one standard (or even if we go with the "allow both" option, we need to choose the default), one reason for "start on the line with the quotes" would be for consistency with pydocstyle D201: One-line docstring should fit on one line with quotes. It doesn't make too much sense to have some start on the same line, and others on a separate one, so to be consistent they should all start from the same place, probably.

@datapythonista
Copy link
Contributor

I still feel that having two conventions is like having some countries driving on the right, and other driving on the left. It's an arbitrary decision, and while for few weeks we'll have a bit more work standardizing, deciding a convention now, will make our lives easier in the future.

If projects disagree one option is to have both checks as you say. But having to always ignore a rule is not great. Or having to specify a --quotes=in/out is making things tricky for a very small thing.

Not sure if I prefer to simply get rid of that check, and I'll move it to pandas to the custom checks.

@larsoner
Copy link
Collaborator

larsoner commented Nov 5, 2019

I still feel that having two conventions is like having some countries driving on the right, and other driving on the left. It's an arbitrary decision, and while for few weeks we'll have a bit more work standardizing, deciding a convention now, will make our lives easier in the future.

I think the burden for making this shift community-wide is pretty big compared to our cost of maintaining two GL0X's in numpydoc (and people needing to know, for a given project, which standard to follow).

Under the analogy, there are already countries where people drive on the left, some on the right, and some on both sides. Forcing a particular standard is like telling England they can either start driving on the right, or we will not supply them with standardized traffic direction designations / checks because we have decided drive-on-the-right is the way to go.

Or having to specify a --quotes=in/out is making things tricky for a very small thing.

I agree, I've come around to thinking that having two conflicting checks that are easily filtered (since most repos will do filtering via a --ignore sorlt of thing anyway) is cleaner than changing how a given check is executed with an option.

Not sure if I prefer to simply get rid of that check, and I'll move it to pandas to the custom checks.

This to me seems worse than enabling both checks. Instead of having each repo (or really the half that doesn't ascribe to what we decide should be the standard --ignore option) set a --ignore, you'll have every numpydoc validator user who wants a "where does the first line start check" coding their own check.

There is precedence for having conflicting checks (e.g., W503, W504) and giving people a way around it (choosing a default --ignore but allowing it to be overridden). Maybe we should see why the flake8 folks went that route, and see if they regret doing it this way.

Also @jnothman @rgommers if you have opinions here it might help us figure out the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants