Disable GL01 check in numpydoc by default. #241

r-build · 2019-11-02T23:48:26Z

Could the GL01 check in numpydoc be dsiabled by default, please?

It contradicts the examples at the numpydoc docstring guide, and also PEP-257
where it says "The summary line may be on the same line as the opening quotes or on the next line.".

The code seems to be in numpydoc/validate.py :
"GL01": "Docstring text (summary) should start in the line immediately " "after the opening quotes (not in the same line, or leaving a " "blank line in between)",
and
if doc.start_blank_lines != 1 and "\n" in doc.raw_doc: errs.append(error("GL01"))

The text was updated successfully, but these errors were encountered:

rth · 2019-11-03T21:42:53Z

Thanks for investigating this @r-build !

larsoner · 2019-11-03T23:56:37Z

@datapythonista can you weigh in?

FWIW I also disabled this when I migrated the MNE-Python checks.

datapythonista · 2019-11-04T00:39:36Z

Can't look at the links now, but if the standard is having the text in the same line of the quotes, my preference is to make the rule validate that this is the case.

We'll have to change hundreds of them in pandas, but better in the long term than doing tricky things.

For disabling things, I wouldn't have that in validate. I think it should report everything we implement, and the function that calls that should ignore whatever you don't want to validate. I think this will make the code much simpler.

jorisvandenbossche · 2019-11-04T07:50:38Z

To be correct, IMO we didn't make a "wrong" choice in pandas, we just made "a" choice, since the standards are (explicitly) ambiguous. PEP257 says: "The summary line may be on the same line as the opening quotes or on the next line", and the numpydoc standard does not say it explicitly, but has conflicting examples (the main example explaining that section has it on the same line, the example for a class docstring on the next line).

My preference would be that we don't need to change it all again in pandas. But that probably means leaving this as an option for the users (using projects), which will complicate the check in numpydoc?

rth · 2019-11-04T08:55:31Z

Indeed, looks like both choices are possible according to the PEP257. This was initially raised because this rule was producing a large number of cosmetic changes in scikit-learn (scikit-learn/scikit-learn#15444 (comment)), but I understand that changing it the other way around in pandas would be annoying as well.

Maybe we could for instance,

either disable it or enforce the the other choice without the newline in the numpydoc validator (so that it is consistent with the numpydoc examples)
have some private way to keep the current behavior for pandas, so you don't have to change it all over again.

cc @jnothman

jnothman · 2019-11-04T09:04:32Z

I agree the user should opt. There are other stylistic variations within the numpydoc community which I don't think are checked by this validator, like the use of a blank line between parameter definitions.

larsoner · 2019-11-04T11:57:13Z

For unambiguous checks, I agree with @datapythonista that "I want to ignore this" can and should be implemented by the end user:

For disabling things, I wouldn't have that in validate. I think it should report everything we implement, and the function that calls that should ignore whatever you don't want to validate. I think this will make the code much simpler.

But for ambiguous checks where there are variants, it might make sense to add complications to the validator function to say which standard you want to follow. So for example we might want GL01='new' | 'same'. In MNE and sklearn we would do 'same' and in pandas you would do 'new'. And if you really don't care either way you cull the GL01 ones from the list.

Currently in MNE we are just ignoring it, but generally adhere to the "same" policy, so it would be great if we could remove it from our ignore list and pass GL01='same' to the call to get stricter about it.

The only other option I can think of that would work would be to allow conflicting standards, e.g. a GL00 that says "Title must start on the same line as triple quotes" along side the GL01 saying it must start on the following line -- then sklearn/MNE can ignore GL01 and pandas ignores GL00. But I don't like that quite as much as it forces a user to always have to ignore something (no docstring can ever pass without some explicit ignore).

larsoner · 2019-11-04T12:41:57Z

... actually thinking about it more and looking around a bit, I'm thinking the conflicting GL00/GL01 (or whatever numbers) isn't such a bad option. For example it exists in flake8:

W503 line break before binary operator
W504 line break after binary operator

In numpydoc end users can then decide (by culling lists or via --ignore option in the command line) which, if either, they want to adhere to.

In the command-line case we can have a default --ignore=GL00 or something that makes it so that at least some docstring will pass all checks. flake8 for example has a default ignore list, too.

datapythonista · 2019-11-04T20:56:21Z

I think we can probably agree in one of the conventions for all projects. I guess there are just historical reasons to have two conventions, and I think in the long run we'll benefit from just having one. The convention docs will be simpler, and people contributing in several projects will save some neurons on not having to remember which format is used in each project.

In pandas we use:

def foo():
    """
    Not in the line of the quotes.
    """

That was after some discussion on what people find more readable I think. And I'd say we also wanted to have 3 extra characters for the one liner summary. Is there any reason for starting with the quotes, besides being the convention currently followed?

My preference is the pandas convention, since it was our preference and we spent significant resources changing those. But if everybody else prefer the opposite, I prefer to have one convention and change again the pandas docstrings.

r-build · 2019-11-04T21:31:20Z

Why not allow both options, as specified deliberately by PEP-257?

(and the google python style guide)

To me, restricting to one or the other seems like a choice that doesn't need to be made.

Regarding triple quotes for one-liners, PEP 257 says "Triple quotes are used even though the string fits on one line. This makes it easy to later expand it."

larsoner · 2019-11-04T21:36:00Z

I think we can probably agree in one of the conventions for all projects. I guess there are just historical reasons to have two conventions, and I think in the long run we'll benefit from just having one. The convention docs will be simpler, and people contributing in several projects will save some neurons on not having to remember which format is used in each project.

Depends probably on who you mean by "we" here. I'm happy to speak for one or two projects, but even choosing one for e.g. SciPy would be a pain, let alone trying to dictate it for multiple projects. For SciPy, being able to check "it fits GL00 or GL01" is better than only "it fits GL01" or "it fits GL02" because it would be a pain to transition everything, and might not ever actually get done.

I agree one standard would be nice, but I think it's going to be very difficult to unify, and get people to transform their code. Different repos already implement different flavors and strictness via flake8 and pydocstyle, etc. so it's not too much to add one more requirement. If we only support one of the two options, large repos that don't follow whichever we choose are likely to just ignore the option rather than transition.

To me the simplest thing is just to implement both, push one (by enabling it by default and disabling the other by default in the command line thing), but allow people to do either.

If others disagree and think we should force one standard (or even if we go with the "allow both" option, we need to choose the default), one reason for "start on the line with the quotes" would be for consistency with pydocstyle D201: One-line docstring should fit on one line with quotes. It doesn't make too much sense to have some start on the same line, and others on a separate one, so to be consistent they should all start from the same place, probably.

datapythonista · 2019-11-04T22:16:59Z

I still feel that having two conventions is like having some countries driving on the right, and other driving on the left. It's an arbitrary decision, and while for few weeks we'll have a bit more work standardizing, deciding a convention now, will make our lives easier in the future.

If projects disagree one option is to have both checks as you say. But having to always ignore a rule is not great. Or having to specify a --quotes=in/out is making things tricky for a very small thing.

Not sure if I prefer to simply get rid of that check, and I'll move it to pandas to the custom checks.

larsoner · 2019-11-05T00:23:22Z

I still feel that having two conventions is like having some countries driving on the right, and other driving on the left. It's an arbitrary decision, and while for few weeks we'll have a bit more work standardizing, deciding a convention now, will make our lives easier in the future.

I think the burden for making this shift community-wide is pretty big compared to our cost of maintaining two GL0X's in numpydoc (and people needing to know, for a given project, which standard to follow).

Under the analogy, there are already countries where people drive on the left, some on the right, and some on both sides. Forcing a particular standard is like telling England they can either start driving on the right, or we will not supply them with standardized traffic direction designations / checks because we have decided drive-on-the-right is the way to go.

Or having to specify a --quotes=in/out is making things tricky for a very small thing.

I agree, I've come around to thinking that having two conflicting checks that are easily filtered (since most repos will do filtering via a --ignore sorlt of thing anyway) is cleaner than changing how a given check is executed with an option.

Not sure if I prefer to simply get rid of that check, and I'll move it to pandas to the custom checks.

This to me seems worse than enabling both checks. Instead of having each repo (or really the half that doesn't ascribe to what we decide should be the standard --ignore option) set a --ignore, you'll have every numpydoc validator user who wants a "where does the first line start check" coding their own check.

There is precedence for having conflicting checks (e.g., W503, W504) and giving people a way around it (choosing a default --ignore but allowing it to be overridden). Maybe we should see why the flake8 folks went that route, and see if they regret doing it this way.

Also @jnothman @rgommers if you have opinions here it might help us figure out the way to go.

This was referenced Nov 3, 2019

numpydoc.validate fails on the reference docstring example #242

Open

DOC Docstring validation in Pipeline estimator scikit-learn/scikit-learn#15444

Merged

DOC Applying numpydoc validation to decomposition.PCA scikit-learn/scikit-learn#15467

Merged

rth mentioned this issue Nov 4, 2019

DOC Birch numpydoc docstring validation scikit-learn/scikit-learn#15511

Merged

jorisvandenbossche mentioned this issue Feb 19, 2020

ARROW-7898: [Python] Reduce the number docstring violations using numpydoc apache/arrow#6444

Closed

rossbar mentioned this issue Jul 3, 2020

Why should docstrings end with a blank line? #274

Open

brandondavid mentioned this issue Jul 31, 2020

[WIP] DOC: docstyle enforcement for physics.optics.polarization sympy/sympy#19862

Draft

akaszynski mentioned this issue Aug 27, 2021

Use numpydoc pyvista/pyvista#1612

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Disable GL01 check in numpydoc by default. #241

Disable GL01 check in numpydoc by default. #241

r-build commented Nov 2, 2019

rth commented Nov 3, 2019

Uh oh!

larsoner commented Nov 3, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

jorisvandenbossche commented Nov 4, 2019

Uh oh!

rth commented Nov 4, 2019

Uh oh!

jnothman commented Nov 4, 2019 via email

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

r-build commented Nov 4, 2019

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

larsoner commented Nov 5, 2019

Uh oh!

Uh oh!

Disable GL01 check in numpydoc by default. #241

Disable GL01 check in numpydoc by default. #241

Comments

r-build commented Nov 2, 2019

rth commented Nov 3, 2019

Uh oh!

larsoner commented Nov 3, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

jorisvandenbossche commented Nov 4, 2019

Uh oh!

rth commented Nov 4, 2019

Uh oh!

jnothman commented Nov 4, 2019 via email

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

r-build commented Nov 4, 2019

Uh oh!

larsoner commented Nov 4, 2019

Uh oh!

datapythonista commented Nov 4, 2019

Uh oh!

larsoner commented Nov 5, 2019

Uh oh!