Skip to content

DOC: Clarify documentation of 'ambiguous' parameter #23408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 4, 2018

Conversation

bartaelterman
Copy link
Contributor

when first reading the docs, the ambiguous parameter was not clear to me. I propose to add a line of documentation and include an example that exactly indicates the issue that the ambiguous parameter solves.

@pep8speaks
Copy link

pep8speaks commented Oct 29, 2018

Hello @bartaelterman! Thanks for updating the PR.

Comment last updated on October 31, 2018 at 23:04 Hours UTC

@codecov
Copy link

codecov bot commented Oct 29, 2018

Codecov Report

Merging #23408 into master will increase coverage by 0.03%.
The diff coverage is 96.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23408      +/-   ##
==========================================
+ Coverage   92.18%   92.21%   +0.03%     
==========================================
  Files         161      161              
  Lines       51160    51173      +13     
==========================================
+ Hits        47161    47190      +29     
+ Misses       3999     3983      -16
Flag Coverage Δ
#multiple 90.65% <96.29%> (+0.03%) ⬆️
#single 42.23% <22.22%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.43% <ø> (ø) ⬆️
pandas/core/arrays/datetimes.py 97.9% <ø> (ø) ⬆️
pandas/core/generic.py 96.81% <96.29%> (+0.02%) ⬆️
pandas/core/series.py 93.87% <0%> (-0.02%) ⬇️
pandas/core/dtypes/base.py 100% <0%> (ø) ⬆️
pandas/util/_decorators.py 91.34% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.46% <0%> (ø) ⬆️
pandas/core/groupby/ops.py 96.79% <0%> (ø) ⬆️
pandas/core/groupby/base.py 91.83% <0%> (ø) ⬆️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 184bcd3...b1026c3. Read the comment docs.

@gfyoung gfyoung added Docs Datetime Datetime data dtype labels Oct 29, 2018
@gfyoung gfyoung assigned jreback and mroeschke and unassigned jreback and mroeschke Oct 29, 2018
@gfyoung gfyoung requested review from mroeschke and jreback October 29, 2018 13:37
@mroeschke
Copy link
Member

Could you copy this explanation to the other occurrences of this argument? It needs to be added to Timestamp, DatetimeIndex and NaT

@jorisvandenbossche
Copy link
Member

@bartaelterman Thanks for the PR!

In addition, what would also be very useful is an actual code example in an Examples section (feel free to add one in this PR, but also don't feel obliged, the PR is already an improvement)

@bartaelterman
Copy link
Contributor Author

Thanks for your feedback! I will wait for your reply on my comment above after which I will copy over the explanation to the other places. I'll look into the Examples section too.

@bartaelterman
Copy link
Contributor Author

@mroeschke I've updated the doc as discussed and added it to Timestamp, DatetimeIndex and NaT.
@jorisvandenbossche I've added a couple of examples. However I'm not comfortable with the Cython stuff...
If you have more remarks, I am happy to modify it further.

@bartaelterman
Copy link
Contributor Author

So the docstring test for NaT have failed but I'm not sure what's wrong:

=================================== FAILURES ===================================
_____________________________ test_NaT_docstrings ______________________________

    def test_NaT_docstrings():
        # GH#17327
        nat_names = dir(NaT)
    
        # NaT should have *most* of the Timestamp methods, with matching
        # docstrings.  The attributes that are not expected to be present in NaT
        # are private methods plus `ts_expected` below.
        ts_names = dir(Timestamp)
        ts_missing = [x for x in ts_names if x not in nat_names and
                      not x.startswith('_')]
        ts_missing.sort()
        ts_expected = ['freqstr', 'normalize',
                       'to_julian_date',
                       'to_period', 'tz']
        assert ts_missing == ts_expected
    
        ts_overlap = [x for x in nat_names if x in ts_names and
                      not x.startswith('_') and
                      callable(getattr(Timestamp, x))]
        for name in ts_overlap:
            tsdoc = getattr(Timestamp, name).__doc__
            natdoc = getattr(NaT, name).__doc__
>           assert tsdoc == natdoc
E           AssertionError: assert '\n        Co...ne.\n        ' == '\n        Con...ne.\n        '
E             Skipping 383 identical leading characters in diff, use -v to show
E             -           When clocks moved backward due to DST, ambiguous times may arise.
E             -             For example in Central European Time (UTC+01), when going from 03:00 DST
E             -             to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC
E             -             and at 01:30:00 UTC. In such a situation, the `ambiguous` parameter
E             -             dictates how ambiguous times should be handled.
E             - 
E             -             - bool contains flags to determine if time is dst or not (note
E             ? --
E             +           - bool contains flags to determine if time is dst or not (note
E                             that this flag is only applicable for ambiguous fall dst dates)
E                           - 'NaT' will return NaT for an ambiguous time
E                           - 'raise' will raise an AmbiguousTimeError for an ambiguous time
E               
E                       nonexistent : 'shift', 'NaT', default 'raise'
E                           A nonexistent time does not exist in a particular timezone
E                           where clocks moved forward due to DST.
E               
E                           - 'shift' will shift the nonexistent time forward to the closest
E                             existing time
E                           - 'NaT' will return NaT where there are nonexistent times
E                           - 'raise' will raise an NonExistentTimeError if there are
E                             nonexistent times
E               
E                           .. versionadded:: 0.24.0
E               
E                       errors : 'raise', 'coerce', default None
E                           - 'raise' will raise a NonExistentTimeError if a timestamp is not
E                              valid in the specified timezone (e.g. due to a transition from
E                              or to DST time). Use ``nonexistent='raise'`` instead.
E                           - 'coerce' will return NaT if the timestamp can not be converted
E                             into the specified timezone. Use ``nonexistent='NaT'`` instead.
E               
E                             .. deprecated:: 0.24.0
E               
E                       Returns
E                       -------
E                       localized : Timestamp
E               
E                       Raises
E                       ------
E                       TypeError
E                           If the Timestamp is tz-aware and tz is not None.

pandas/tests/scalar/test_nat.py:185: AssertionError

@mroeschke
Copy link
Member

@bartaelterman in essence, the docstrings for NaT need to be the same as the docstrings for Timestamp, so you need to add your explanation of ambiguous in nattype.pyx for the tz_localize defined there.

@bartaelterman
Copy link
Contributor Author

Should I add the docstring also to core/arrays/datetimes.py? And in that case, shall I add the example with dt.tz_localize() there? It seems to me that would end up here which seems appropriate.

@mroeschke
Copy link
Member

Yes that'd be great @bartaelterman.

The examples look good. Just looks like there's a doctest failure and pep8 issues.

@bartaelterman
Copy link
Contributor Author

The Travis build is still failing, but I'm not sure why. All tests seem to pass and I don't see an error message, except at the very end:
The command "ci/code_checks.sh" exited with 2.
Any ideas what's going wrong? Thanks

@TomAugspurger
Copy link
Contributor

see https://travis-ci.org/pandas-dev/pandas/jobs/448947238#L2763

Some lines are too long.

non-DST time (note that this flag is only applicable for ambiguous
times, but the array must have the same length as vals)
- bool if True, treat all vals as DST. If False, treat them as non-DST
- 'NaT' will return NaT where there are ambiguous times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the string 'NaT', or the singleton pd.NaT (or either?) Can we clarify that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the allowed types are a bit inconsistent among all these functions. Can you check whether that's the docs being wrong? or are the just different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return the pd.NaT singleton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about whether the input should be pd.NaT or 'NaT'.

Copy link
Member

@mroeschke mroeschke Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, ambiguous accepts the input 'NaT' and should raise on pd.NaT

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

But we should check whether the ambiguos accepts different values ni each function, or it's the documentation that doesn't document all cases. But it can be in a separate PR I think.

@datapythonista datapythonista changed the title Add documentation line with example for the ambiguous parameter DOC: Clarify documentation of 'ambiguous' parameter Nov 4, 2018
@TomAugspurger
Copy link
Contributor

Agreed. This is a useful change as is.

@TomAugspurger TomAugspurger merged commit 07426af into pandas-dev:master Nov 4, 2018
@bartaelterman bartaelterman deleted the add_tz_localize_doc branch November 4, 2018 15:57
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
* Add documentation line with example for the ambiguous parameter of tz_locaclize

* Updating 'ambiguous'-param doc + update it on Timestamp, DatetimeIndex and NaT

This is following the discussion at
pandas-dev#23408 (comment)
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
* Add documentation line with example for the ambiguous parameter of tz_locaclize

* Updating 'ambiguous'-param doc + update it on Timestamp, DatetimeIndex and NaT

This is following the discussion at
pandas-dev#23408 (comment)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* Add documentation line with example for the ambiguous parameter of tz_locaclize

* Updating 'ambiguous'-param doc + update it on Timestamp, DatetimeIndex and NaT

This is following the discussion at
pandas-dev#23408 (comment)
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
* Add documentation line with example for the ambiguous parameter of tz_locaclize

* Updating 'ambiguous'-param doc + update it on Timestamp, DatetimeIndex and NaT

This is following the discussion at
pandas-dev#23408 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants