-
Notifications
You must be signed in to change notification settings - Fork 3
dtype not preserved on round trip with xarray #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the bug report @itcarroll. I was able to reproduce the issue from your code snippet, and I am looking into a fix now. |
@itcarroll I was able to track down the bug, and it is actually a bug in xarray that happens during the decode_cf_variable step. I opened xarray issue #6055 that you can follow. The reason you may be seeing this here and not with other xarray backends, is that we always set a fill value for the TileDB attributes. A temporary fix is to set import tiledb
import xarray as xr
import numpy as np
index = tiledb.Dim(name='index', domain=(0, 3))
domain = tiledb.Domain(index)
var = tiledb.Attr(name='var', dtype=np.int16)
schema = tiledb.ArraySchema(domain=domain, attrs=[var], sparse=False)
tiledb.Array.create('dense_array0', schema)
with tiledb.open('dense_array0', 'w') as A:
A[:] = np.array([5, 6, 7, 8], dtype=np.int16)
ds = xr.open_dataset('dense_array0', mask_and_scale=False, engine='tiledb')
ds['var'].dtype |
Thanks for quick investigation and follow-up with XArray. Makes sense to me if you want to close this issue. |
I think XArray's logic of promoting to float when the _FillValue attribute is set is reasonable, with setting So I have to ask why TileDB always sets the _FillValue attribute? You seem to be introducing arbitrary metadata (which has consequences, apparently!). Any documentation on this? |
The TileDB array uses a fill value for writing to an array where you are not filling an entire tile. This is part of the TileDB API versus NetCDF (what xarray was original designed to handle) where _FillValue is always a metadata convention. I could default to NOT adding |
It keeps going deeper! I wouldn't change anything here, yet. |
This issue got brought up again elsewhere. In the next release, the TileDB-xarray backend will default to not adding the |
I tested it out. Good solution. BTW: Is it on purpose that if you DIDN'T write to all domain values in a tiledb attribute, then open_dataset will error on loading the data? I would expect loading to quietly include the fill. |
It should quietly include the fill. That is a separate bug that will also be fixed by PR #124 (also to be included in the next release). |
Uh oh!
There was an error while loading. Please reload this page.
The data type I get when opening a TileDB Array with XArray does not match the data type in the TileDB ArraySchema. In the example below, I put in int16 and get back float32.
I have tiledb 0.11.3, libtiledb 2.5.2 and tiledb-cf 0.5.2 on the python:latest docker image.
The text was updated successfully, but these errors were encountered: