-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add UO SRML data reader #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pvlib/iotools/srml.py
Outdated
|
||
# Quality flags are all labeled 0, but occur immediately after their | ||
# associated var so we create a dict mapping them to var_flag for renaming | ||
flag_label_map = {flag: data.columns[data.columns.get_loc(flag)-1]+'_flag' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
pvlib/iotools/srml.py
Outdated
data = data.rename(columns=flag_label_map) | ||
# For data flagged bad or missing, replace the value with np.NaN | ||
for col in data.columns[::2]: | ||
data[col] = data[col].where(~(data[col+'_flag'] == 99), np.NaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
pvlib/iotools/srml.py
Outdated
except KeyError: | ||
return col | ||
try: | ||
return var_map[col[:3]]+'_'+col[3:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
pvlib/iotools/srml.py
Outdated
year=year % 100, | ||
month=month) | ||
url = "http://solardat.uoregon.edu/download/Archive/" | ||
data = read_srml(url+file_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to include a data file for testing and demo purposes. I'm not sure about the length. Usually I like short files that are easy to visually scan. But it's a 1 minute resolution dataset, so limiting it only a day would still produce a file that is difficult to visually scan.
pvlib/iotools/srml.py
Outdated
|
||
def read_srml(filename): | ||
""" | ||
Read SRML file into pandas dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_srml
is probably ok for the function name. And SRML
might be ok for the first line of the doc string. But then we definitely need to say the full name: University of Oregon Solar Radiation Measurement Laboratory. And provide a link to their website in a References section.
pvlib/iotools/srml.py
Outdated
2400 hours, and to avoid time parsing errors on leap years. Data values | ||
on a given line should now be understood to occur during the interval | ||
extending from the time of the line in which they are listed to | ||
the ending time on the next line, rather than the previous line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pvlib/iotools/srml.py
Outdated
year = tsv_data.columns[1] | ||
data = format_index(tsv_data, year) | ||
# Rename and drop datetime columns | ||
data = data[data.columns[2:]].rename(columns=map_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two lines here will make debugging easier when it inevitably breaks.
pvlib/iotools/srml.py
Outdated
data = data[data.columns[2:]].rename(columns=map_columns) | ||
|
||
# Quality flags are all labeled 0, but occur immediately after their | ||
# associated var so we create a dict mapping them to var_flag for renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an example like: # ghi_1, 0, dni_1, 0 is mapped to ghi_1, ghi_1_flag, dni_1, dni_1_flag
pvlib/iotools/srml.py
Outdated
Spectral data (7xxx) uses all four digits to indicate the | ||
variable. | ||
""" | ||
var_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok but I am wondering if it would be better to define it at the top of the module. Then it's easier for people to inspect it if they are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I must have misunderstood something the other day, I'll replace this at the top of the module.
pvlib/iotools/srml.py
Outdated
except KeyError: | ||
return col | ||
try: | ||
return var_map[col[:3]] + '_' + col[3:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend more explicit code like
variable_name = var_map[col[:3]]
variable_number = col[3:]
return variable_name + '_' + variable_number
pvlib/iotools/srml.py
Outdated
|
||
|
||
def request_srml_data(station, year, month, filetype='PO'): | ||
"""Read a month of SRML data from solardat into a Dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs the same references as the basic read function above.
pvlib/iotools/srml.py
Outdated
return df | ||
|
||
|
||
def request_srml_data(station, year, month, filetype='PO'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe read_srml_from_solardat
and import it in __init__.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this module include these kind of util functions? I wasn't sure about including this here, or a function to stitch together due to their function signatures being pretty far off from read_<data_type> functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the module should include any generally useful functions for IO related to the UOSRML. That would include downloading directly from their website.
I suppose this function might more accurately be called read_srml_month_from_solardat
I'm less sure about a function to download and stitch together multiple months because you said that the columns sometimes change. If we made it, that's the function that would be called read_srml_from_solardat
pvlib/iotools/__init__.py
Outdated
@@ -1,2 +1,4 @@ | |||
from pvlib.iotools.tmy import read_tmy2 # noqa: F401 | |||
from pvlib.iotools.tmy import read_tmy3 # noqa: F401 | |||
from pvlib.iotools.srml import read_srml # noqa: F401 | |||
from pvlib.iotools.srml import read_srml_month_from_solardat # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E261 at least two spaces before inline comment
pvlib/iotools/srml.py
Outdated
numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_ | ||
""" | ||
variable_map = { | ||
'100': 'ghi', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E126 continuation line over-indented for hanging indent
pvlib/iotools/srml.py
Outdated
'931': 'temp_dew', | ||
'933': 'relative_humidity', | ||
'937': 'temp_cell', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E121 continuation line under-indented for hanging indent
pvlib/test/test_srml.py
Outdated
|
||
@network | ||
def test_read_srml_month_from_solardat(): | ||
file_data = srml.read_srml('http://solardat.uoregon.edu/download/Archive/EUPO1801.txt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (91 > 79 characters)
Updated in response to some of your comments:
Would it be useful if this could handle data other than 1-min resolution? |
""" | ||
df_time = df[df.columns[1]] - 1 | ||
df_doy = df[df.columns[0]] | ||
fifty_nines = df_time % 100 == 99 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be ==59
? Or are there lines in the data where the hour of day is something like hh99? A comment here explaining this line and the next would be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variable names might be more confusing than I had intended. 59 is the desired result here, after reducing the times by 1 all of the hours become HH99(e.g. 2400->2399), so they need to be adjusted.
This was to get around exceptions trying to parse a datetime from day 366 hour 2400 of a leap year. It is admittedly quite confusing. I will document the rationale better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, makes sense now. Because of -1, the times run from hh00, hh01, ..., hh58, hh99. Minor nit - putting df_doy =
ahead of df_time =
would help me see the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're close
pvlib/iotools/srml.py
Outdated
the fourth indicating the instrument. Spectral data (7xxx) uses all | ||
four digits to indicate the variable. See a full list of data element | ||
numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_ | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a comment or part of the module documentation? If a comment, use # to start each line (pep8). If documentation, I think it should be combined with first block quote. We haven't set any standards about documenting module variables, so either approach is ok with me.
pvlib/iotools/srml.py
Outdated
four digits to indicate the variable. See a full list of data element | ||
numbers `here. <http://solardat.uoregon.edu/DataElementNumbers.html>`_ | ||
""" | ||
variable_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VARIABLE_MAP
?
pvlib/iotools/srml.py
Outdated
|
||
Notes | ||
----- | ||
Note that the time index is shifted back one minute to account for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete Note that
pvlib/iotools/srml.py
Outdated
----- | ||
Note that the time index is shifted back one minute to account for | ||
2400 hours, and to avoid time parsing errors on leap years. Data values | ||
on a given line should now be understood to occur during the interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace "Data values ... previous line" with "The returned data values should be understood to occur during the interval from the time of the row until the time of the next row. This is consistent with pandas' default labeling behavior."
pvlib/iotools/srml.py
Outdated
|
||
References | ||
---------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blank line needed here.
pvlib/test/test_srml.py
Outdated
assert data.index[0] == start | ||
assert data.index[-1] == end | ||
assert (data.index[59::60].minute == 59).all() | ||
assert year not in data.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this assert fail? it's type sensitive...
In [3]: df = pd.DataFrame(columns=['2016'])
In [4]: '2016' in df.columns
Out[4]: True
In [5]: 2016 in df.columns
Out[5]: False
In [6]: df = pd.DataFrame(columns=[2016])
In [7]: '2016' in df.columns
Out[7]: False
In [8]: 2016 in df.columns
Out[8]: True
great, thanks @lboeman |
pvlib python pull request guidelines
Thank you for your contribution to pvlib python! You may delete all of these instructions except for the list below.
You may submit a pull request with your code at any stage of completion.
The following items must be addressed before the code can be merged. Please don't hesitate to ask for help if you're unsure of how to accomplish any of the items below:
docs/sphinx/source/api.rst
for API changes.docs/sphinx/source/whatsnew
file for all changes.Brief description of the problem and proposed solution (if not already fully described in the issue linked to above):