Skip to content

ValueError: conflicting sizes for dimension with single item / asset #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Apr 23, 2021 · 5 comments · Fixed by #35
Closed

ValueError: conflicting sizes for dimension with single item / asset #33

TomAugspurger opened this issue Apr 23, 2021 · 5 comments · Fixed by #35
Labels
needs-future-test Add a test for this in the future, once tests exist (#26)

Comments

@TomAugspurger
Copy link
Contributor

I haven't looked too closely at this, but I'm wondering if there's some issue with the resampling / resolution handling? In this example I have a single STAC item and I select a single asset from it. stackstac (actually xarray) complains that the sizes of the data and coordinates don't match. (this example requires pip install planetary-computer and pystac, but doesn't need an API token).

import planetary_computer as pc
import stackstac
import requests
import pystac

r = requests.get(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-8-c2-l2/items/LC08_L2SP_046027_20200908_20200919_02_T1"
)

item = pystac.Item.from_dict(r.json())

sitem = pc.sign_assets(item)

stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b99acad73e1d> in <module>
     13 sitem = pc.sign_assets(item)
     14 
---> 15 stackstac.stack([sitem.to_dict()], assets=["SR_B2"])

/srv/conda/envs/notebook/lib/python3.8/site-packages/stackstac/stack.py in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, reader)
    290     )
    291 
--> 292     return xr.DataArray(
    293         arr,
    294         *to_coords(

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, indexes, fastpath)
    407             data = _check_data_shape(data, coords, dims)
    408             data = as_compatible_data(data)
--> 409             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    410             variable = Variable(dims, data, attrs, fastpath=True)
    411             indexes = dict(

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
    155         for d, s in zip(v.dims, v.shape):
    156             if s != sizes[d]:
--> 157                 raise ValueError(
    158                     "conflicting sizes for dimension %r: "
    159                     "length %s on the data but length %s on "

ValueError: conflicting sizes for dimension 'x': length 7772 on the data but length 7773 on coordinate 'x'

The correct shape, according to rasterio, is

>>> ds = rasterio.open(sitem.assets["SR_B2"].href)
>>> ds.shape
(7891, 7771)

This could easily be user error :)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Apr 23, 2021

Whoops, I just tried on main and see a different error, related to the recent timezone changes.

In [9]: stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-d4cc13b0c565> in <module>
----> 1 stackstac.stack([sitem.to_dict()], assets=["SR_B2"])

~/src/gjoseph92/stackstac/stackstac/stack.py in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, reader)
    295     return xr.DataArray(
    296         arr,
--> 297         *to_coords(
    298             plain_items,
    299             asset_ids,

~/src/gjoseph92/stackstac/stackstac/prepare.py in to_coords(items, asset_ids, spec, xy_coords, properties, band_coords)
    347     dims = ["time", "band", "y", "x"]
    348     coords = {
--> 349         "time": pd.to_datetime(
    350             [item["properties"]["datetime"] for item in items],
    351             infer_datetime_format=True,

~/miniconda3/envs/stackstac/lib/python3.8/site-packages/pandas/core/indexes/extension.py in method(self, *args, **kwargs)
     76
     77         def method(self, *args, **kwargs):
---> 78             result = attr(self._data, *args, **kwargs)
     79             if wrap:
     80                 if isinstance(result, type(self._data)):

~/miniconda3/envs/stackstac/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in tz_convert(self, tz)
    800         if self.tz is None:
    801             # tz naive, use tz_localize
--> 802             raise TypeError(
    803                 "Cannot convert tz-naive timestamps, use tz_localize to localize"
    804             )

TypeError: Cannot convert tz-naive timestamps, use tz_localize to localize
ipdb> pp pd.to_datetime(items[0]['properties']['datetime'])
Timestamp('2020-09-08 18:55:51.575595+0000', tz='UTC')

Seems to be the issue with infer_datetime_format=True. Commenting that out gets us past it.

gjoseph92 added a commit that referenced this issue Apr 23, 2021
With #27, the `tz_convert` would fail when we tripped over pandas-dev/pandas#41047, since the DatetimeIndex would be tz-naive. Now, we assume tz-naive datetimes are already in UTC.

Addresses #33 (comment)
gjoseph92 added a commit that referenced this issue Apr 23, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
With #27, the `tz_convert` would fail when we tripped over pandas-dev/pandas#41047, since the DatetimeIndex would be tz-naive. Now, we assume tz-naive datetimes are already in UTC.

Addresses #33 (comment)
gjoseph92 added a commit that referenced this issue Apr 23, 2021

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
I don't know why I didn't use `linspace` in the first place. With `arange` there were floating-point errors that could cause `ceil` to over/under-shoot by 1.

Closes #33
@TomAugspurger
Copy link
Contributor Author

Thanks Gabe, confirmed that this fixed it!

@gjoseph92
Copy link
Owner

@TomAugspurger the conflicting sizes issue should be fixed by #35. I thought I'd fixed it in #25, but that still had some issues with floating-point error.

BTW, I think the off-by-one between stackstac's shape and the shape of the underlying dataset as reported by rasterio is actually partially due to an issue with the STAC metadata:

import planetary_computer as pc
import stackstac
import requests
import pystac
import rasterio
import affine

r = requests.get(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-8-c2-l2/items/LC08_L2SP_046027_20200908_20200919_02_T1"
)

item = pystac.Item.from_dict(r.json())
sitem = pc.sign_assets(item)

ds = rasterio.open(sitem.assets["SR_B2"].href)
stac_transform = Affine(*asset.properties["proj:transform"])
>>> ds.transform
Affine(30.0, 0.0, 472485.0,
       0.0, -30.0, 5373615.0)
>>> stac_transform
Affine(29.996139492986746, 0.0, 472500.0,
       0.0, -29.99619820048156, 5373600.0)

>>> ds.transform.xoff - stac_transform.xoff
-15.0
>>> ds.transform.yoff - stac_transform.yoff
15.0

It looks like in the STAC metadata, the asset is shifted half a pixel to the southeast. Additionally, the resolution reported by proj:transform is a tiny bit smaller than its actual resolution. Because stackstac prefers to take the resolution directly from the geotrans when possible, this slight offset is what's eventually leading to the shape mismatch.

The bounds reported in STAC metadata also don't match the bounds of the GeoTIFF:

>>> list(ds.bounds)
[472485.0, 5136885.0, 705615.0, 5373615.0]
>>> item.ext.projection.bbox
[472500.0, 5136900.0, 705600.0, 5373600.0]

>>> np.array(ds.bounds) - np.array(item.ext.projection.bbox)
array([-15., -15.,  15.,  15.])

It seems there's the half-pixel shift, plus in the STAC bounds, the asset is 30m (1px) smaller than in the GeoTIFF (which is probably why the resolution in the geotrans is slightly off):

>>> ds_shape_m = (ds.bounds[2] - ds.bounds[0], ds.bounds[3] - ds.bounds[1])
>>> stac_bounds = item.ext.projection.bbox
>>> stac_shape_m = (stac_bounds[2] - stac_bounds[0], stac_bounds[3] - stac_bounds[1])
>>> np.array(ds_shape_m) - np.array(stac_shape_m)
array([30., 30.])

So I think stackstac is actually coming up with the right shape given the STAC metadata, but that seems slightly offset from the underlying data.

@TomAugspurger
Copy link
Contributor Author

Thanks for investigating that. @lossyrob, do you have any thoughts on #33 (comment)? I don't actually remember if these STAC items were produced by us / stactools, or if it came from USGS with the COGs.

@lossyrob
Copy link

lossyrob commented May 4, 2021

The shape is generated via the MTL data here and the affine transformation is calculated from the bbox here. Potentially there's a mismatch between the provided metadata via the MTL/ANG files and the raster footprints.

@gjoseph92 gjoseph92 added needs-future-test Add a test for this in the future, once tests exist (#26) and removed needs-future-test Add a test for this in the future, once tests exist (#26) labels Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-future-test Add a test for this in the future, once tests exist (#26)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants