Skip to content

DOC: Correct docstring formatting for excel related functions GH23494 #23505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from

Conversation

timdef
Copy link

@timdef timdef commented Nov 5, 2018

Addresses failed validations from validate_docstrings.py in pandas.read_excel

First open source pull request! Looking to improve.

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2018

Hello @timdef! Thanks for updating the PR.

Line 200:80: E501 line too long (83 > 79 characters)

Comment last updated on November 06, 2018 at 14:58 Hours UTC

@timdef
Copy link
Author

timdef commented Nov 5, 2018

Two things that the validation called out that I wasn't certain about:

  1. Sometimes the parameter description ends with a bulleted list, validation would like a '.' on the last item of that list, even if the other items don't end in periods.
  2. Deprecated parameters often don't currently have descriptions, only deprecated warnings. I had to put above warning to get validation to pass. Is this desired?

@timdef
Copy link
Author

timdef commented Nov 5, 2018

Also, still currently have these errors:

3 Errors found:
	Errors in parameters section
		Parameters {**kwds, date_parser, parse_dates} not documented
		Unknown parameters {parse_cols, sheetname, skip_footer, keep_default_na, verbose}
2 Warnings found:
	No extended summary found
	See Also section not found

Wanted to see if I should take a stab at correcting these or if this work was more about bringing things in line format-wise with the validator.

@codecov
Copy link

codecov bot commented Nov 5, 2018

Codecov Report

Merging #23505 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23505   +/-   ##
=======================================
  Coverage   92.23%   92.23%           
=======================================
  Files         161      161           
  Lines       51197    51197           
=======================================
  Hits        47220    47220           
  Misses       3977     3977
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 42.27% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bb24b7...a0a3f99. Read the comment docs.

@gfyoung gfyoung added Docs IO Excel read_excel, to_excel labels Nov 5, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 5, 2018

Wanted to see if I should take a stab at correcting these or if this work was more about bringing things in line format-wise with the validator.

@timdef : First off, congrats on your first PR! I think it would be a good idea to attempt to patch these.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, added some comments, mainly about our conventions.

I think a See Also section, pointing to to_excel, read_csv... would be useful too.

data will be read in as floats: Excel stores all numbers as floats
internally
internally.

Returns
-------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next line (github doesn't let me add a comment there, as it's not modified): DataFrame or dict of DataFrame.

Then, in the examples, I'd remove the part that creates the Excel files. I'm sure the users will understand that they need to have as parameter a file that exists.

Just add after the lines with read_excel: # doctest: +SKIP so it doesn't run in the doctests (and it doesn't fail).

@@ -40,17 +40,16 @@
_writers = {}

_read_excel_doc = """
Read an Excel table into a pandas DataFrame
Read an Excel table into a pandas DataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a extended summary giving a bit more information? Questions that came to my mind users could have and would be nice to answer there are:
-Which sheet/s in the Excel file will be loaded

  • Which formats/versions of Excel do we support xls, xlsx?
  • You can mention that the file to be opened can be in the local filesystem or a url.

Anything else you thing it can be helpful for users to know about this function before reading the parameters (where these questions will be answered in many cases)

io : string, path object (pathlib.Path or py._path.local.LocalPath),
file-like object, pandas ExcelFile, or xlrd workbook.
io : str, path object (pathlib.Path or py._path.local.LocalPath), \
file-like object, pandas ExcelFile, or xlrd workbook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to use the Python file types comma separated (I'd like to be able to parse these types). This would make more sense to me:

io : str, file descriptor, pathlib.Path, ExcelFile or xlrd.Book

In the description you can add further information if needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since parameter can be a "pathlib.Path or py._path.local.LocalPath" would it make sense to make the descriptor more general ("path")? I see that the Py library is in maintenance mode, just seeking to understand going forward!

@@ -194,7 +198,7 @@
1 string2 2
2 string3 3

>>> pd.read_excel(open('tmp.xlsx','rb'))
>>> pd.read_excel(open('tmp.xlsx', 'rb'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this example, may be you can show one with sheet_name='Sheet3' instead?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timdef can you update based on the new and previous comments please?

a b
0 1 2
1 #2 3

>>> pd.read_excel('tmp.xlsx', comment='#')
>>> pd.read_excel('tmp.xlsx', comment='#') # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove these two last examples. It assumes that the content of tmp.xlsx is different than before, which makes things trickier and not very easy to follow. And I don't think it's a very important use case.


Returns
-------
parsed : DataFrame or Dict of DataFrames
parsed : DataFrame or dict of DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parsed : DataFrame or dict of DataFrame
DataFrame or dict of DataFrame


Returns
-------
parsed : DataFrame or Dict of DataFrames
parsed : DataFrame or dict of DataFrame
DataFrame from the passed in Excel file. See notes in sheet_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataFrame from the passed in Excel file. See notes in sheet_name
DataFrame from the passed in Excel file. See notes in sheet_name

@jreback
Copy link
Contributor

jreback commented Nov 11, 2018

can you merge master, pandas.io.excel was refactored a bit

@datapythonista
Copy link
Member

@timdef do you have time to address the comments and fix the conflicts?

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

closing as stale. if you'd like to continue working, pls ping.

@jreback jreback closed this Dec 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix docstrings of Excel related functions
5 participants