-
Notifications
You must be signed in to change notification settings - Fork 9
Automatically generate docstring for each existing and newly written processes. #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wrapping these expressions in
I guess that makes sense and shouldn't be an issue, right? However, using try and except probably isn't the best way to deal with the failed pytest since these are pre-defined keys. I really cannot say if this happens because I hardcoded the keys or it has something to do with the internal logic of pytest or something completely different. I'd be thankful for ideas. |
Okay, in the spirit of encapsulation and OOP, I should start to rely on setters and getters. intent = value.metadata.get('intent').value
description = value.metadata.get('description') Running
I'm curious now, how intent (and equally description) can turn out to be of type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks @rlange2 for submitting this!
I have some inline comments.
I think we should also allow more control on where to include the generated parameter section in the docstrings if it already exists, e.g.,
@xsimlab.process(autodoc=True)
class A:
"""Process A
{{parameters}}
Notes
------
Some notes about this process.
"""
...
I also think we should include other information such as the dimensions, perhaps under the variable description as a bullet list (and maybe move the intent there), e.g.,
a_var: object
Variable description
- dimensions: scalar or 'x' or ('x', 'y')
- intent: 'in'
another_var: object
...
See here for an example.
On a more general note, I try to more or less strictly follow the PEP8 conventions for formatting the code. There are tools like flake8 (linters) that can help you enforcing the code into this format. I will eventually use black so that we won't need to worry about code style.
|
Dear @benbovy, many thanks for your input. Most of it seems pretty clear to me. On some others, I'm afraid, I'll need some clarification. But let's go through that step by step:
Yes, having a keyword that later on gets replaced is a good idea.
That looks better from an organisational perspective. I was wondering what other information are useful to add to the bullet list. The metadata sure offer a lot of them and it's probably a good idea to check beforehand, which items are actually set and others we don't include if they are
That is a fair point. I just started using flake8 and incorporated their feedback. Will soon take a deeper look into black
Indeed that was the case. Maybe at some point, we can talk about why that happens.
Okay, I understand. Then it was probably a logic error from my side. From my understanding
Here, I run into problems and did so in the past. When I try: The updated docstring doesn't show up in When using instead: I get: AttributeError: 'dict' object attribute '__doc__' is read-only That's why I started overwriting the base class. I only overwrite, when Thanks for the help. |
Good point. I think it's enough to show the class and variable to which refers a foreign variable. You could look at formatting.var_details(), which is used to format the docstring of the generated class properties for the variables. Actually, maybe you could just reuse it here?
It will likely return
Hence https://github.com/benbovy/xarray-simlab/pull/67#discussion_r353609467
Oh yes I see, actually the docstrings should be updated for both the base class (stand-alone) and the subclass (the class returned by |
Reusing
Here, a nested loop is introduced to format the output on the fly: def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
docstring = "\nAttributes\n----------\n"
indent = " "
for key, value in variables_dict(self._base_cls).items():
temp_string = ''
var_attributes = var_details(value).split("\n")
for line in range(len(var_attributes)):
if line == len(var_attributes) - 1:
temp_string += indent + var_attributes[line]
else:
temp_string += indent + var_attributes[line] + "\n"
var_attributes = temp_string
docstring += f"{key}: {data_type}\n{var_attributes}\n"
if self._base_cls.__doc__ is not None:
if attributes_keyword in self._base_cls.__doc__:
self._base_cls.__doc__ = self._base_cls.__doc__.replace(attributes_keyword,
docstring)
else:
self._base_cls.__doc__ += f"\n{docstring}"
else:
self._base_cls.__doc__ = docstring
def var_details(var, docstring=False):
max_line_length = 70
var_metadata = var.metadata.copy()
description = textwrap.fill(var_metadata.pop('description').capitalize(),
width=max_line_length, subsequent_indent=' ')
detail_items = [('type', var_metadata.pop('var_type').value),
('intent', var_metadata.pop('intent').value)]
detail_items += list(var_metadata.items())
if docstring==True:
details = "\n".join([" - {} : {}".format(k, v) for k, v in detail_items])
else:
details = "\n".join(["- {} : {}".format(k, v) for k, v in detail_items])
return description + "\n\n" + details + '\n' This change let's us get rid of the second loop in def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
docstring = "\nAttributes\n----------\n"
indent = " "
for key, value in variables_dict(self._base_cls).items():
var_attributes = var_details(value, docstring=True)
docstring += f"{key}: {data_type}\n{indent}{var_attributes}\n"
if self._base_cls.__doc__ is not None:
if attributes_keyword in self._base_cls.__doc__:
self._base_cls.__doc__ = self._base_cls.__doc__.replace(attributes_keyword,
docstring)
else:
self._base_cls.__doc__ += docstring
else:
self._base_cls.__doc__ = docstring There's no need anymore to import
A few things I noticed: Finally, when using a non xsimlab-variable via |
I prefer the 1st approach. It's better customize things at a higher level and not propagate complexity at lower levels (e.g., I think you could get rid of the nested loop by using def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
fmt_vars = []
for vname, var in variables_dict(self._base_cls).items():
var_header = f"{vname} : {data_type}"
var_content = textwrap.indent(var_details(var), " " * 4)
fmt_vars.append(f"{var_header}\n{var_content}")
fmt_section = textwrap.indent("Attributes\n"
"----------\n"
"\n".join(fmt_vars),
" " * 4)
current_doc = self._base_cls.__doc__ or ""
if attributes_keyword in current_doc:
new_doc = current_doc.replace(attributes_keyword,
fmt_section[4:])
else:
new_doc = f"\n\n{fmt_section}\n"
self._base_cls.__doc__ = new_doc The code here above should also properly handle all line returns and indentation. |
You can ignore this for now. Actually, I think that it will be better if the description of a foreign variable corresponds to the description of the variable it refers to (we can do this in another PR). |
That is a really clean and thorough approach! I slightly changed your code to the following: def render_docstrings(self):
attributes_keyword = "{{attributes}}"
data_type = "object" # placeholder until issue #34 is solved
fmt_vars = []
for vname, var in variables_dict(self._base_cls).items():
var_header = f"{vname} : {data_type}"
var_content = textwrap.indent(var_details(var), " " * 4)
fmt_vars.append(f"{var_header}\n{var_content}")
fmt_section = textwrap.indent("Attributes\n"
"----------\n"
+ "\n".join(fmt_vars),
" " * 4)
current_doc = self._base_cls.__doc__ or ""
if attributes_keyword in current_doc:
new_doc = current_doc.replace(attributes_keyword,
fmt_section[4:])
else:
new_doc = f"{current_doc}\n{fmt_section}\n"
self._base_cls.__doc__ = new_doc
I would like to know what you think about setting
True. I'm not sure how to link a variable in a process to its original process. Will think about it. Edit: Maybe |
For better overall readability, I'm wondering if we shouldn't move the the formatting logic currently in from .formatting import format_attribute_section
def render_docstrings(self):
new_doc = format_attribute_section(self._base_cls)
self._base_cls.__doc__ = new_doc
self._p_cls_dict['__doc__'] = new_doc The last line is needed (I think) to be able to get the docstrings from process instances attached to a model (i.e., |
We should also take the indentation into account when wrapping the variable description as a text block in the section. This would need exposing the max line length as a parameter, i.e., |
Nit: |
I pushed the update since I felt that it's good to have a more recent version to talk about. Not sure why the checks have failed since I will look into the |
The errors on travis are unrelated. I'll remove python 3.5 from the test matrix and fix the issue with the latest xarray release in another PR. That said, it would be good to add a test for this "autodoc" feature.
It's a good default IMO. More descriptive docstrings is good.
I like adding |
xsimlab/process.py
Outdated
@@ -3,11 +3,12 @@ | |||
import inspect | |||
import sys | |||
import warnings | |||
import textwrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thank you. Might have jumped the gun here.
This is now fixed in the master branch. You can merge it here. |
…en process, if autodoc=True in the process decorator.
The formatting of the docstring has been outsourced to a separate function in formatting.add_attribute_section(). This way, the render_docstrings() only needs to call that function and pass its return to the __doc__ attribute of the base class (and subclasses). There are more suggestions in the PR but I had the impression that it's beneficial to update the recent code.
I'm not sure about this, e.g. from fastscape.processes import basic_model
basic_model.terrain It returns the information with the previous formatting, i.e.
The same is the case for a custom process, see DippingDyke example from the first post: from fastscape.models import basic_model
basic_model = basic_model.update_processes({'Dyke':DippingDyke})
basic_mode.Dyke returns
That is the case, whether I include
For the current format, it seems like there are mostly two places where text length might be a concern. That would be the i) variable description and ii) the bullet point
Feel free to share your thoughts and add to the list.
That sounds good. I'm not yet sure how to properly customise your own build but I hope, we can discuss it at one point.
Great!
The description = textwrap.fill(var_metadata.pop('description').capitalize(),
width=max_line_length) or (
"(no description given)") Your suggestion inspired me to use I tried to introduce horizontal rulers to structure my posts a bit more. If I should make separate posts, please don't hesitate to mention it.
I hope, I did the right thing :) The |
Yeah I think it's enough for now to wrap only the description (with 70 - 8 char width in this case) and let the bullet list untouched.
The test would be very similar to those already implemented in https://github.com/benbovy/xarray-simlab/blob/master/xsimlab/tests/test_formatting.py.
Actually, I prefer a
Sometimes it's good to push your last changes/commits first and then add yourself inline comments, so that we can discuss specific details just next to the relevant lines of code.
Same advise, don't hesitate to push your commits early. This way I can directly see the output in the CI logs. |
For readability, variable description in var_details() now assigned with a separate conditional expression. To account for indents in the docstring, variable description is now wrapped after 62 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same advise, don't hesitate to push your commits early. This way I can directly see the output in the CI logs.
Ok, the script now only fails the non-raised NotImplementedError
.
____________________________ test_process_decorator ____________________________
809
810 def test_process_decorator():
811 with pytest.raises(NotImplementedError):
812 > @xs.process(autodoc=True)
813 class Dummy:
814 E Failed: DID NOT RAISE <class 'NotImplementedError'>
815
816 xsimlab/tests/test_process.py:210: Failed
it would be good to add a test for this "autodoc" feature [...] I'm not yet sure how to properly customise your own build
The test would be very similar to those already implemented in https://github.com/benbovy/xarray-simlab/blob/master/xsimlab/tests/test_formatting.py.
Thank you. This is what I will look into next.
xsimlab/process.py
Outdated
@@ -453,7 +452,6 @@ def render_docstrings(self): | |||
new_doc = add_attribute_section(self._base_cls) | |||
|
|||
self._base_cls.__doc__ = new_doc | |||
self._p_cls_dict['__doc__'] = new_doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._base_cls.__doc__ = new_doc
is enough to create docstring in help(mymodel.myprocess)
. In fact, it's pasted in there twice, at the beginning and the end.
With this commit, a test for the autodoc-feature will be implemented. For now, however, pytest returns an AssertionError because `add_attribute_section()` in `test_add_attribute_section()` returns the docstring twice. Reason unknown. Also, `@pytest.mark.xfail` was introduced to circumvent failures in `test_process_decorator()` and `test_add_attribute_section()`. This will be only temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With test_add_attribute_section()
I tried to compare the output of add_attribute_section()
with a template. However, the output gets generated twice and therefore fails to pass the test.
In tests.fixture_process
I found the following:
@pytest.fixture(scope='session')
def in_var_details():
return dedent("""\
Input variable
- type : variable
- intent : in
- dims : (('x',), ('x', 'y'))
- group : None
- attrs : {}
""")
Is that related to test_var_details()
?
|
||
""") | ||
|
||
assert add_attribute_section(Dummy) == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves me with an AssertionError
. Apparently, the docstring gets returned twice and therefore fails to be equal to expected
. I tried this with a Dummy
class as well as with tests.fixture_process.SomeProcess
. I'm not sure, though, why that happens.
xsimlab/tests/test_formatting.py
Outdated
@@ -37,6 +39,34 @@ def test_var_details(example_process_obj): | |||
assert "- dims : (('x',),)" in var_details_str | |||
|
|||
|
|||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.mark.xfail
was introduced to allow for this failure (also in tests.test_process.test_process_decorator()
. This will be only temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but then I cannot see in the CI logs why the test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We're almost there.
Some minor issues remaining.
Also, the docstrings of the process
decorator should be updated (here: https://github.com/benbovy/xarray-simlab/blob/f595fd6af3150f4f293306985c644d2949299852/xsimlab/process.py#L492), with default: False
-> default: True
and maybe with a bit more details.
And don't forget to update the release notes in doc/whats_new.rst
.
xsimlab/tests/test_formatting.py
Outdated
@@ -37,6 +39,34 @@ def test_var_details(example_process_obj): | |||
assert "- dims : (('x',),)" in var_details_str | |||
|
|||
|
|||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove xfail
here. We expect that this test always succeeds.
xsimlab/tests/test_formatting.py
Outdated
"""This is a Dummy class | ||
to test `add_attribute_section()` | ||
""" | ||
var = xs.variable(dims='x', description='a variable') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add another variable with no description to test the case where (no description given)
is added to the docstrings.
It might be good to also test the case where the {{attributes}}
placeholder is present in the original docstrings.
xsimlab/tests/test_process.py
Outdated
@@ -205,6 +205,7 @@ def run_step(self, a, b): | |||
assert "Process runtime methods" in str(excinfo.value) | |||
|
|||
|
|||
@pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, here this should be removed.
You should instead replace the NotImplementedError
test below by some logic testing if the docstrings are updated if autodoc=True
and if it stays unchanged if autodoc=False
. No need to duplicate the test that you have written for the attribute section content, assert "Attributes" in cls.__doc__
might be enough here.
xsimlab/formatting.py
Outdated
var_metadata = var.metadata.copy() | ||
|
||
description = textwrap.fill(var_metadata.pop('description').capitalize(), | ||
width=max_line_length) | ||
if description == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if not description:
is slightly more idiomatic.
@rlange2 I started using black in #84. As a consequence, there are some conflicts between this branch and the master branch. Let me know if you have some work that you haven't commit/pushed yet. Otherwise, I'll resolve the conflicts through this interface and you'll just need to pull the changes. From now on, you can stop worrying about code formatting and use black: see here. |
Fork is now synched with latest commit 1d995b4 of original repo. whats_new.rst has been updated. Placeholder and if-clause if no description has been given for individual variables. Docstring for `process.process.py` has been updated. Lastly, tests included for `add_attribute_section()` and `process_decorator()` in `test_formatting` and `test_process`, respectively.
Latest changes include:
I created a new branch and used it to overwrite the existing autodoc-branch. I hope that wasn't the worst idea. Should I highlight the changes I made? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Just two minor comments left.
doc/whats_new.rst
Outdated
@@ -47,6 +47,27 @@ Enhancements | |||
to :func:`xarray.Dataset.xsimlab.run`. | |||
- More consistent dictionary format for output variables in the xarray | |||
extension (:issue:`85`). | |||
- Existing and newly written processes will be updated automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the release notes concise, e.g., with something like:
The ``autodoc`` parameter of the :func:`~xsimlab.process` decorator
now allows to automatically add an attributes section in the docstrings
of the class to which the decorator is applied, using the metadata of
each variable declared in the class.
xsimlab/process.py
Outdated
{{attributes}} can be used as a placeholder for the updated | ||
metadata information. | ||
|
||
Docstring template: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again here, not sure it's worth showing the whole attributes section template.
Great! Thanks @rlange2 |
Will update the
doc
attribute of each process in the framework.autodoc=True
does not need to be passed to the xs.process decorator and instead is set globally.I tried to follow the numpy docstring guide as closely as possible. Basically, variable name, type (if supplied), intent and description will be taken into account and represented in the format:
For example, the user defined process:
Will return: