Skip to content

OpenDAP loaded Dataset has lon/lats with type 'object'. #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akleeman opened this issue Mar 3, 2014 · 4 comments
Closed

OpenDAP loaded Dataset has lon/lats with type 'object'. #39

akleeman opened this issue Mar 3, 2014 · 4 comments

Comments

@akleeman
Copy link
Contributor

akleeman commented Mar 3, 2014

ds = xray.open_dataset('http://motherlode.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p5deg/files/GFS_Global_0p5deg_20140303_0000.grib2', decode_cf=False)
In [4]: ds['lat'].dtype
Out[4]: dtype('O')

This makes serialization fail.

@shoyer
Copy link
Member

shoyer commented Mar 3, 2014

This is because coordinates are loaded as pandas.Index objects... which don't always faithfully preserve the type of the underlying object (see pandas-dev/pandas#6471).

I believe serialization should still work though thanks to a work around I added for dtype=object. Do let me know if this is not the case. One solution to make this less awkward would be to wrap pandas.Index in something that keeps track of the dtype of the original arguments for use in mathematical expression.

@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 3, 2014

Indices also have an .inferred_type getter. unfortunately it doesn't seem
to return true type names...

In [13]: pandas.Index([1,2,3]).inferred_type
Out[13]: 'integer'

In [14]: pandas.Index([1,2,3.5]).inferred_type
Out[14]: 'mixed-integer-float'

In [15]: pandas.Index(["ab","cd"]).inferred_type
Out[15]: 'string'

In [16]: pandas.Index(["ab","cd",3]).inferred_type
Out[16]: 'mixed-integer'

On Sun, Mar 2, 2014 at 10:14 PM, Stephan Hoyer [email protected]:

This is because coordinates are loaded as pandas.Index objects... which
don't always faithfully preserve the type of the underlying object (see
pandas-dev/pandas#6471 pandas-dev/pandas#6471).

I believe serialization should still work though thanks to a work around I
added for dtype=object. Do let me know if this is not the case. One
solution to make this less awkward would be to wrap pandas.Index in
something that keeps track of the dtype of the original arguments for use
in mathematical expression.

Reply to this email directly or view it on GitHubhttps://github.com/akleeman/xray/issues/39#issuecomment-36484122
.

@akleeman
Copy link
Contributor Author

akleeman commented Mar 3, 2014

@shoyer You're right I can serialize the latitude object directly from that opendap url ... but after some manipulation I run into this:

ipdb> print fcst
dimensions:
    latitude = 31
    longitude = 46
    time = 7
variables:
    object latitude(latitude)
        units:degrees_north
        _CoordinateAxisType:Lat
    object longitude(longitude)
        units:degrees_east
        _CoordinateAxisType:Lon
    datet... time(time)
        standard_name:time
        _CoordinateAxisType:Time
        units:hours since 2014-03-03 00:0...
ipdb> fcst.dump('./test.nc')
*** TypeError: illegal primitive data type, must be one of ['i8', 'f4', 'u8', 'i1', 'U1', 'S1', 'i2', 'u1', 'i4', 'u2', 'f8', 'u4'], got object

Currently tracking down exactly whats going on here.

akleeman referenced this issue Mar 3, 2014
When encoding cf variables check if dtype is np.datetime64.

fixes akleeman/xray/issues/39
shoyer added a commit that referenced this issue Mar 7, 2014
This allows us to simplify our internal model for XArray (it always cached
internally as a base ndarray) and supports some previously tricky aspects
involving pandas.Index objects. Noteably:
1. The dtype of arrays stored as pandas.Index objects can now be faithfully
   saved and restored. Doing math with XArray objects always yields objects
   with the right dtype, so `ds['latitude'] + 1` has dtype=float, not
   dtype=object.
2. It's no longer necessary to load index data into memory upon creating a new
   Dataset. Instead, the index data can be loaded on demand.
3. `var.data` is always an ndarray. `var.index` is always a pandas.Index.

Related issues: #17, #39, #40.
@shoyer
Copy link
Member

shoyer commented Mar 24, 2014

I believe this was fixed by #54.

@shoyer shoyer closed this as completed Mar 24, 2014
keewis pushed a commit to keewis/xarray that referenced this issue Jan 17, 2024
* add initial draft of docs

* add pages

* made build work, but had to rollback docstring modification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants