Skip to content

Update date type handling in mpas_xarray #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 8, 2016

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Nov 25, 2016

This merge changes the way that mpas_xarray converts MPAS data
arrays of various types to dates used for xarray data sets.
At least one (and potentially 2, e.g. xtime_start and xtime_end)
MPAS array names are passed to mpas_preprocess. mpas_xarray
will convert these arrays to dates, using the data type to determine the
type of data (date string, or days since simulation start) to be converted.

The 5 analysis scripts have been updated to use the new syntax.

Unit tests for mpas_xarray have been added to the continuous integration testing infrastructure.

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

This PR is intended to address #33.

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

Testing

I ran the same tests as in #46 (one for ACME alpha7 and one for alpha8). Output can be found here:
http://portal.nersc.gov/project/acme/xylar/update_mpas_xarray_date_handling/

The results are bit-for-bit with the branch in #46 (which is not bit-for-bit with develop) except for OHC and SST time-series plots. These have changed because I am now using the mean of xtime_start and xtime_end to compute the time for each data point, rather than the time-average of daysSinceStartOfSim. My understanding is that the new approach is the desired approach, and that the sea-ice analysis will support moving in this direction in future ACME versions as well.

Differences the SST and OHC plots are quite small. As an example, compare the following:
http://portal.nersc.gov/project/acme/xylar/update_mpas_xarray_date_handling/baseline/alpha7/ohc_global_20160805.A_WCYCL1850.ne30_oEC.edison.alpha7_00_B1850C5_ne30_v0.4.png
http://portal.nersc.gov/project/acme/xylar/update_mpas_xarray_date_handling/test/alpha7/ohc_global_20160805.A_WCYCL1850.ne30_oEC.edison.alpha7_00_B1850C5_ne30_v0.4.png

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

@pwolfram, this should only be merged once @milenaveneziani has fulfilled #46, since I have continued off of that branch (thus the don't merge flag).

@xylar
Copy link
Collaborator Author

xylar commented Nov 25, 2016

Note: I haven't tried to make the 5 analysis scripts PEP8 compliant but I have tried to make any lines I changed PEP8 compliant (to save a bit of future work).

@pwolfram
Copy link
Contributor

pwolfram commented Dec 1, 2016

@xylar, can you please do a quick rebase of this branch when you get a chance? If I recall correctly, the PR can be blotched and not close properly if I do this in the merge. This will also help with the review. Thanks!

@xylar
Copy link
Collaborator Author

xylar commented Dec 1, 2016

@pwolfram, I don't think this needs to be rebased. I think the merge will work fine without it as long as the hashes of the commits are the same as those that got merged. Your test merge should show whether this is the case. Let me know if you really need a rebase.

@milenaveneziani
Copy link
Collaborator

@xylar: I seem to understand this code is backward compatible, i.e. it recognizes whether the time entry is a float (for daysSinceStart) or a string (xtime_start). Is that right?
This is great. Thanks for doing this.

preprocess_mpas(x, yearoffset=yr_offset,
timestr=['xtime_start', 'xtime_end'],
onlyvars=['time_avg_activeTracers_temperature'],
selvals={'nVertLevels':1}))
ds = remove_repeated_time_index(ds)
ds.rename({'time_avg_activeTracers_temperature':'mpasData'}, inplace = True)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make things backward compatible, it seems like we need a list of options for the timestr variable as well: did you want to handle this in the specific PR that will address #20?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milenaveneziani, yes that's exactly right. What I realized in the process of making this PR is that timestr will need to be a special case in the PR for addressing #20 because we want to support both the possibility of a single variable (e.g. xtime or daysSinceStartOfSim) and that of a pair of variable names like in this case. That will be a bit trickier to implement, and may require tweaking the config reader to handle a list of lists (if it doesn't already) but it should be doable. Again, this is a good case to run into now so we build in some more sophisticated functionality that will hopefully be useful for the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar: this is what I was referring to in my comments to #48. But I guess I was wrong, because xtime_start and xtime_end are already supported in alpha.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, this is backwards compatible at least back to alpha7.

preprocess_mpas(x,
yearoffset=yr_offset,
timestr=['xtime_start','xtime_end'],
onlyvars=varList))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice cleaning up!

`preprocess_mpas(..., timeSeriestats=False, ...)`.
2. converts MPAS "timeSinceStartOfSim" to xarray time for MPAS fields coming
from the timeSeriesStatsAM. Time dimension is assigned via
`preprocess_mpas(..., timeSeriesStats=True, ...)`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this comment should be updated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. I'm working on this now.

@xylar xylar force-pushed the update_mpas_xarray_date_handling branch from e1fe6eb to 2d4b04f Compare December 1, 2016 19:43
@xylar
Copy link
Collaborator Author

xylar commented Dec 1, 2016

@pwolfram, I went ahead and rebased when I was editing the mpas_xarray docstring. If nothing else, that makes it clearer here on GitHub which files were actually changed for this PR and not for the previously merged one. Silly GitHub!

ds = xr.open_mfdataset(infiles, preprocess=lambda x:
preprocess_mpas(x, yearoffset=yr_offset,
timestr=['xtime_start', 'xtime_end'],
onlyvars=['time_avg_dThreshMLD']))
ds = remove_repeated_time_index(ds)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that splitting the lambda x: and process_mpas(... lines here and elsewhere is not optimal. I have not found a more elegant solution that is PEP8 compliant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find one either @xylar. The way you have done this is much clearer so this is great forward progress we all should be very happy about. Thanks for taking another look.

@milenaveneziani
Copy link
Collaborator

@pwolfram, @xylar: unless I am missing something, I don't see reasons for not merging this. We have all PR's in place to solve issues that have come up from this particular PR, right?

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

@milenaveneziani, I think these changes are large enough to need some testing on @pwolfram's part before we merge. There may be cases he exposes that I haven't tested yet. But as far as I'm concerned, this is ready to merge when he's comfortable with it.

starts = get_datetimes(ds[timestr[0]], yearoffset)
ends = get_datetimes(ds[timestr[1]], yearoffset)
datetimes = [starts[i] + (ends[i] - starts[i])/2
for i in range(len(starts))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, I think this should probably be in the get_datetimes function. The purpose here is to keep preprocess_mpas as simple a function as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, I think we could have all of 184-192 inside the get_datetimes function. We don't want to make preprocess_mpas more complex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'm fine with doing that.

@pwolfram
Copy link
Contributor

pwolfram commented Dec 5, 2016

@xylar and @milenaveneziani, part of the problem I think we are facing (or shortly will face) is that it is hard to test code when inputs are changing, especially over multiple model versions. It would be good to get some tests in place that use the testing core, because that would help increase confidence that the code changes don't break prior assumptions. This certainly would be a requirement to fix a bug report.

However, perhaps the way to go in the short term is just to use testing like @xylar did above but I also need to verify that the existing tests in mpas_xarray.py still work because this provides a minimal test of backwards capability. I'm not 100% sure how we should we should handle complete backward testing in general. Perhaps this isn't the responsibility of this PR.

Moving forward, I added a comment that needs addressed for mpas_xarray.py. Following its resolution, I will test this and get this merged.

@xylar
Copy link
Collaborator Author

xylar commented Dec 5, 2016

@pwolfram, I think the kind of testing you're describing is outside the scope of this PR, but I'm open to reconsidering. I do like the idea of adding tests for the capabilities that are added as we go. In this case, since there don't seem to be any mpas_xarray tests in the repo to begin with, I think building such tests is a separate project.

@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

@pwolfram, I'm now working on unit testing for mpas_xarray that I will add as a second commit to this PR.

@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

@pwolfram and @milenaveneziani, in the process of adding unit testing, I discovered that there's an incompatibility between xtime-type variables and daysSinceStartOfSim-type variables. The former uses the 365-day MPAS calendar without leap years. But when xarray computes times from the latter, it takes leap years into account, meaning that the time average of daysSinceStartOfSim doesn't match well with the average of xtime_start and xtime_end after several years. This is something I'm going to try to address in this PR.

@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

The bug I thought I was seeing related to leap years turns out instead to be a known bug in daysSinceStartOfSim (MPAS-Dev/MPAS#1096) that we haven't found the cause of yet. xarray appears to use the same no-leap calendar as MPAS by default (at least I think so).

Anyway, the only remaining piece for this PR seems to be to integrate my new testing, which I will do now.

@xylar xylar removed the in progress label Dec 6, 2016
self.assertEqual(date, pd.Timestamp('1855-01-13 12:24:14'))


# def test_selvals(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, I tried to make a unit test for using selvals but (as you can see by uncommenting this function) it didn't work as expected. For some reason, xarray doesn't seem to recognize that refBottomDepth varies along dimension nVertLevels so it tries to treat refBottomDepth itself as a coordinate (which it is not) and fails. If you find a fix, please push it to this branch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar, the issue is that the correct dimension to do the search on is nVertLevels, e.g., if you load the whole dateset without selection

ds
<xarray.Dataset>
Dimensions:                                                      (Time: 1, nOceanRegionsTmp: 7, nVertLevels: 100)
Coordinates:
  * Time                                                         (Time) datetime64[ns] 1855-01-13T12:24:13.593599 ...
  * nOceanRegionsTmp                                             (nOceanRegionsTmp) int64 0 ...
  * nVertLevels                                                  (nVertLevels) int64 0 ...
Data variables:
    time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature  (Time, nOceanRegionsTmp, nVertLevels) float64 -0.6598 ...
    refBottomDepth                                               (nVertLevels) float64 1.51 ...

If I understand correctly, I think what we want is something more like

ds.where(ds.refBottomDepth == 8.77999997138977, drop=True)
<xarray.Dataset>
Dimensions:                                                      (Time: 1, nOceanRegionsTmp: 7, nVertLevels: 1)
Coordinates:
  * Time                                                         (Time) datetime64[ns] 1855-01-13T12:24:13.593599 ...
  * nOceanRegionsTmp                                             (nOceanRegionsTmp) int64 0 ...
  * nVertLevels                                                  (nVertLevels) int64 4 ...
Data variables:
    time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature  (Time, nOceanRegionsTmp, nVertLevels) float64 -0.6544 ...
    refBottomDepth                                               (nVertLevels) float64 8.78 ...

This segements the dataset for all cases of refBottomDepth of 8.77999997138977. Is this the desired check here? If so, I can fix this in the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should note that this also does change the usage of the api for mpas_xarray if this is the desired behavior so we may be better off in the short-term fixing this elsewhere unless it is needed for your work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwolfram, Thanks for the example. What I actually want is a test of selvals, so if you can come up with an alternative (and non-trivial) test of selvals, that should replace this test. Maybe just post a suggestion here?

@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

@pwolfram, the CI is failing because matplotlib is not installed. While one solution would be to take matplotlib out of mpas_xarray, we will almost certainly need it for unit testing of our analysis scripts down the line. Presumably you just need to add it to the list of packages to be installed?

@pwolfram
Copy link
Contributor

pwolfram commented Dec 6, 2016

@xylar, I'll submit a quick PR that will hopefully fix this issue. We can test it by you just temporarily cherry-picking the commit. Does this sound good?

@pwolfram
Copy link
Contributor

pwolfram commented Dec 6, 2016

The PR is #53 and the commit to cherry-pick is ce69185a2e1011e5c921576daf42d5bd9e2363e6

xylar added 2 commits December 6, 2016 17:57
This merge changes the way that mpas_xarray converts MPAS data
arrays of various types to dates used for xarray data sets.
At least one (and potentially 2) MPAS array names are passed to
mpas_preprocess.  mpas_xarray will convert these arrays to
dates, using the data type to determine the type of data
(date string, or days since simulation start) to be converted.

The 5 analysis scripts have been updated to use the new syntax.

timeseries_y1 and timeseries_yr2 have been added to the OHC script
but general values (other than 1 and 9999, respectively) are not
yet supported.
@xylar xylar force-pushed the update_mpas_xarray_date_handling branch from 60b8124 to 00ee576 Compare December 6, 2016 16:57
@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

@pwolfram, the cherry-pick seems like more trouble than it's worth. In fact, a separate PR seems like overkill but I went ahead and merged it. If that doesn't take care of things for some reason, let's just fix it here.

@xylar
Copy link
Collaborator Author

xylar commented Dec 6, 2016

@pwolfram, I had to add dask but now we're good. Please continue reviewing this and other open PRs in this repo at your earliest convenience.

int(x[8:10]), int(x[11:13]),
int(x[14:16]), int(x[17:19]))
for x in time]
datetimes = get_datetimes(ds, timestr, yearoffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xylar for cleaning this up to keep preprocess_mpas simpler.

@pwolfram
Copy link
Contributor

pwolfram commented Dec 7, 2016 via email

@xylar
Copy link
Collaborator Author

xylar commented Dec 7, 2016

Okay, those changes look fine to me, and the tests clearly pass.

@pwolfram pwolfram force-pushed the update_mpas_xarray_date_handling branch 3 times, most recently from 0e5dd19 to 62cf89c Compare December 7, 2016 21:28
Ensures that selected values are contained within the dimensions of
the dataset as well as tests that selection works properly for
the `nVertLevels` dimension.
@pwolfram pwolfram force-pushed the update_mpas_xarray_date_handling branch from 62cf89c to 29e2e84 Compare December 7, 2016 21:32
@pwolfram
Copy link
Contributor

pwolfram commented Dec 7, 2016

LGTM, ready to merge following verification via CI and aggregated testing of #47, #50, #51 as discussed with @xylar.

@xylar
Copy link
Collaborator Author

xylar commented Dec 7, 2016

ready to merge following verification via CI

Are you going to do some additional CI testing beyond what happens automatically? I guess you could run pytest after merging?

@pwolfram
Copy link
Contributor

pwolfram commented Dec 7, 2016

@xylar, I've been running py.test on my own machine and just want to make sure the automated testing also checks out. There is no reason to suppose it wouldn't but I just want to make sure just in case because we have automated CI in place to ensure confidence.

@pwolfram
Copy link
Contributor

pwolfram commented Dec 7, 2016

To be clear, I'll run py.test prior to merging on my own machine.

@pwolfram pwolfram merged commit 29e2e84 into MPAS-Dev:develop Dec 8, 2016
@xylar xylar deleted the update_mpas_xarray_date_handling branch December 8, 2016 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants