-
Notifications
You must be signed in to change notification settings - Fork 52
Update date type handling in mpas_xarray #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update date type handling in mpas_xarray #47
Conversation
This PR is intended to address #33. |
TestingI ran the same tests as in #46 (one for ACME alpha7 and one for alpha8). Output can be found here: The results are bit-for-bit with the branch in #46 (which is not bit-for-bit with Differences the SST and OHC plots are quite small. As an example, compare the following: |
@pwolfram, this should only be merged once @milenaveneziani has fulfilled #46, since I have continued off of that branch (thus the |
Note: I haven't tried to make the 5 analysis scripts PEP8 compliant but I have tried to make any lines I changed PEP8 compliant (to save a bit of future work). |
@xylar, can you please do a quick rebase of this branch when you get a chance? If I recall correctly, the PR can be blotched and not close properly if I do this in the merge. This will also help with the review. Thanks! |
@pwolfram, I don't think this needs to be rebased. I think the merge will work fine without it as long as the hashes of the commits are the same as those that got merged. Your test merge should show whether this is the case. Let me know if you really need a rebase. |
@xylar: I seem to understand this code is backward compatible, i.e. it recognizes whether the time entry is a float (for daysSinceStart) or a string (xtime_start). Is that right? |
preprocess_mpas(x, yearoffset=yr_offset, | ||
timestr=['xtime_start', 'xtime_end'], | ||
onlyvars=['time_avg_activeTracers_temperature'], | ||
selvals={'nVertLevels':1})) | ||
ds = remove_repeated_time_index(ds) | ||
ds.rename({'time_avg_activeTracers_temperature':'mpasData'}, inplace = True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make things backward compatible, it seems like we need a list of options for the timestr variable as well: did you want to handle this in the specific PR that will address #20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milenaveneziani, yes that's exactly right. What I realized in the process of making this PR is that timestr
will need to be a special case in the PR for addressing #20 because we want to support both the possibility of a single variable (e.g. xtime
or daysSinceStartOfSim
) and that of a pair of variable names like in this case. That will be a bit trickier to implement, and may require tweaking the config reader to handle a list of lists (if it doesn't already) but it should be doable. Again, this is a good case to run into now so we build in some more sophisticated functionality that will hopefully be useful for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Yes, this is backwards compatible at least back to alpha7.
preprocess_mpas(x, | ||
yearoffset=yr_offset, | ||
timestr=['xtime_start','xtime_end'], | ||
onlyvars=varList)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice cleaning up!
`preprocess_mpas(..., timeSeriestats=False, ...)`. | ||
2. converts MPAS "timeSinceStartOfSim" to xarray time for MPAS fields coming | ||
from the timeSeriesStatsAM. Time dimension is assigned via | ||
`preprocess_mpas(..., timeSeriesStats=True, ...)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment should be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. I'm working on this now.
e1fe6eb
to
2d4b04f
Compare
@pwolfram, I went ahead and rebased when I was editing the |
ds = xr.open_mfdataset(infiles, preprocess=lambda x: | ||
preprocess_mpas(x, yearoffset=yr_offset, | ||
timestr=['xtime_start', 'xtime_end'], | ||
onlyvars=['time_avg_dThreshMLD'])) | ||
ds = remove_repeated_time_index(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that splitting the lambda x:
and process_mpas(...
lines here and elsewhere is not optimal. I have not found a more elegant solution that is PEP8 compliant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find one either @xylar. The way you have done this is much clearer so this is great forward progress we all should be very happy about. Thanks for taking another look.
2d4b04f
to
46ce29c
Compare
360e59d
to
ad28184
Compare
@milenaveneziani, I think these changes are large enough to need some testing on @pwolfram's part before we merge. There may be cases he exposes that I haven't tested yet. But as far as I'm concerned, this is ready to merge when he's comfortable with it. |
starts = get_datetimes(ds[timestr[0]], yearoffset) | ||
ends = get_datetimes(ds[timestr[1]], yearoffset) | ||
datetimes = [starts[i] + (ends[i] - starts[i])/2 | ||
for i in range(len(starts))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I think this should probably be in the get_datetimes
function. The purpose here is to keep preprocess_mpas
as simple a function as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, I think we could have all of 184-192 inside the get_datetimes
function. We don't want to make preprocess_mpas
more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm fine with doing that.
@xylar and @milenaveneziani, part of the problem I think we are facing (or shortly will face) is that it is hard to test code when inputs are changing, especially over multiple model versions. It would be good to get some tests in place that use the testing core, because that would help increase confidence that the code changes don't break prior assumptions. This certainly would be a requirement to fix a bug report. However, perhaps the way to go in the short term is just to use testing like @xylar did above but I also need to verify that the existing tests in Moving forward, I added a comment that needs addressed for |
@pwolfram, I think the kind of testing you're describing is outside the scope of this PR, but I'm open to reconsidering. I do like the idea of adding tests for the capabilities that are added as we go. In this case, since there don't seem to be any mpas_xarray tests in the repo to begin with, I think building such tests is a separate project. |
@pwolfram, I'm now working on unit testing for mpas_xarray that I will add as a second commit to this PR. |
@pwolfram and @milenaveneziani, in the process of adding unit testing, I discovered that there's an incompatibility between |
The bug I thought I was seeing related to leap years turns out instead to be a known bug in Anyway, the only remaining piece for this PR seems to be to integrate my new testing, which I will do now. |
self.assertEqual(date, pd.Timestamp('1855-01-13 12:24:14')) | ||
|
||
|
||
# def test_selvals(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, I tried to make a unit test for using selvals
but (as you can see by uncommenting this function) it didn't work as expected. For some reason, xarray doesn't seem to recognize that refBottomDepth
varies along dimension nVertLevels
so it tries to treat refBottomDepth
itself as a coordinate (which it is not) and fails. If you find a fix, please push it to this branch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, the issue is that the correct dimension to do the search on is nVertLevels
, e.g., if you load the whole dateset without selection
ds
<xarray.Dataset>
Dimensions: (Time: 1, nOceanRegionsTmp: 7, nVertLevels: 100)
Coordinates:
* Time (Time) datetime64[ns] 1855-01-13T12:24:13.593599 ...
* nOceanRegionsTmp (nOceanRegionsTmp) int64 0 ...
* nVertLevels (nVertLevels) int64 0 ...
Data variables:
time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature (Time, nOceanRegionsTmp, nVertLevels) float64 -0.6598 ...
refBottomDepth (nVertLevels) float64 1.51 ...
If I understand correctly, I think what we want is something more like
ds.where(ds.refBottomDepth == 8.77999997138977, drop=True)
<xarray.Dataset>
Dimensions: (Time: 1, nOceanRegionsTmp: 7, nVertLevels: 1)
Coordinates:
* Time (Time) datetime64[ns] 1855-01-13T12:24:13.593599 ...
* nOceanRegionsTmp (nOceanRegionsTmp) int64 0 ...
* nVertLevels (nVertLevels) int64 4 ...
Data variables:
time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature (Time, nOceanRegionsTmp, nVertLevels) float64 -0.6544 ...
refBottomDepth (nVertLevels) float64 8.78 ...
This segements the dataset for all cases of refBottomDepth
of 8.77999997138977
. Is this the desired check here? If so, I can fix this in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should note that this also does change the usage of the api for mpas_xarray if this is the desired behavior so we may be better off in the short-term fixing this elsewhere unless it is needed for your work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, Thanks for the example. What I actually want is a test of selvals, so if you can come up with an alternative (and non-trivial) test of selvals, that should replace this test. Maybe just post a suggestion here?
@pwolfram, the CI is failing because matplotlib is not installed. While one solution would be to take matplotlib out of mpas_xarray, we will almost certainly need it for unit testing of our analysis scripts down the line. Presumably you just need to add it to the list of packages to be installed? |
@xylar, I'll submit a quick PR that will hopefully fix this issue. We can test it by you just temporarily cherry-picking the commit. Does this sound good? |
The PR is #53 and the commit to cherry-pick is |
This merge changes the way that mpas_xarray converts MPAS data arrays of various types to dates used for xarray data sets. At least one (and potentially 2) MPAS array names are passed to mpas_preprocess. mpas_xarray will convert these arrays to dates, using the data type to determine the type of data (date string, or days since simulation start) to be converted. The 5 analysis scripts have been updated to use the new syntax. timeseries_y1 and timeseries_yr2 have been added to the OHC script but general values (other than 1 and 9999, respectively) are not yet supported.
60b8124
to
00ee576
Compare
@pwolfram, the cherry-pick seems like more trouble than it's worth. In fact, a separate PR seems like overkill but I went ahead and merged it. If that doesn't take care of things for some reason, let's just fix it here. |
@pwolfram, I had to add |
int(x[8:10]), int(x[11:13]), | ||
int(x[14:16]), int(x[17:19])) | ||
for x in time] | ||
datetimes = get_datetimes(ds, timestr, yearoffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xylar for cleaning this up to keep preprocess_mpas
simpler.
@xylar, I just pushed a more rigorous set of error testing, including testing of sel. We could make the functionality work as expected in the previous example if you would like too, although we may want to hold off on that for now.
On Dec 7, 2016, at 11:27 AM, Xylar Asay-Davis <[email protected]<mailto:[email protected]>> wrote:
@xylar commented on this pull request.
________________________________
In mpas_analysis/test/test_mpas_xarray.py<#47>:
+ 'refBottomDepth']
+
+ ds = xr.open_mfdataset(
+ fileName,
+ preprocess=lambda x: mpas_xarray.preprocess_mpas(x,
+ timestr=timestr,
+ onlyvars=varList,
+ yearoffset=1850))
+ self.assertEqual(ds.data_vars.keys(), varList)
+ date = pd.Timestamp(ds.Time.values[0])
+ # round to nearest second
+ date = pd.Timestamp(long(round(date.value, -9)))
+ self.assertEqual(date, pd.Timestamp('1855-01-13 12:24:14'))
+
+
+# def test_selvals(self):
@pwolfram<https://github.com/pwolfram>, Thanks for the example. What I actually want is a test of selvals, so if you can come up with an alternative (and non-trivial) test of selvals, that should replace this test. Maybe just post a suggestion here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#47>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEGMrb5-f1LMR1HzoLCO7tMpCGAxzrmRks5rFvp4gaJpZM4K8wPv>.
|
Okay, those changes look fine to me, and the tests clearly pass. |
0e5dd19
to
62cf89c
Compare
Ensures that selected values are contained within the dimensions of the dataset as well as tests that selection works properly for the `nVertLevels` dimension.
62cf89c
to
29e2e84
Compare
Are you going to do some additional CI testing beyond what happens automatically? I guess you could run pytest after merging? |
@xylar, I've been running |
To be clear, I'll run |
This merge changes the way that mpas_xarray converts MPAS data
arrays of various types to dates used for xarray data sets.
At least one (and potentially 2, e.g. xtime_start and xtime_end)
MPAS array names are passed to mpas_preprocess. mpas_xarray
will convert these arrays to dates, using the data type to determine the
type of data (date string, or days since simulation start) to be converted.
The 5 analysis scripts have been updated to use the new syntax.
Unit tests for mpas_xarray have been added to the continuous integration testing infrastructure.