Skip to content

ValueError: conflicting sizes for dimension with single item / asset #33

Closed
@TomAugspurger

Description

@TomAugspurger

I haven't looked too closely at this, but I'm wondering if there's some issue with the resampling / resolution handling? In this example I have a single STAC item and I select a single asset from it. stackstac (actually xarray) complains that the sizes of the data and coordinates don't match. (this example requires pip install planetary-computer and pystac, but doesn't need an API token).

import planetary_computer as pc
import stackstac
import requests
import pystac

r = requests.get(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-8-c2-l2/items/LC08_L2SP_046027_20200908_20200919_02_T1"
)

item = pystac.Item.from_dict(r.json())

sitem = pc.sign_assets(item)

stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b99acad73e1d> in <module>
     13 sitem = pc.sign_assets(item)
     14 
---> 15 stackstac.stack([sitem.to_dict()], assets=["SR_B2"])

/srv/conda/envs/notebook/lib/python3.8/site-packages/stackstac/stack.py in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, reader)
    290     )
    291 
--> 292     return xr.DataArray(
    293         arr,
    294         *to_coords(

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, indexes, fastpath)
    407             data = _check_data_shape(data, coords, dims)
    408             data = as_compatible_data(data)
--> 409             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    410             variable = Variable(dims, data, attrs, fastpath=True)
    411             indexes = dict(

/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
    155         for d, s in zip(v.dims, v.shape):
    156             if s != sizes[d]:
--> 157                 raise ValueError(
    158                     "conflicting sizes for dimension %r: "
    159                     "length %s on the data but length %s on "

ValueError: conflicting sizes for dimension 'x': length 7772 on the data but length 7773 on coordinate 'x'

The correct shape, according to rasterio, is

>>> ds = rasterio.open(sitem.assets["SR_B2"].href)
>>> ds.shape
(7891, 7771)

This could easily be user error :)

Activity

TomAugspurger

TomAugspurger commented on Apr 23, 2021

@TomAugspurger
ContributorAuthor

Whoops, I just tried on main and see a different error, related to the recent timezone changes.

In [9]: stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-d4cc13b0c565> in <module>
----> 1 stackstac.stack([sitem.to_dict()], assets=["SR_B2"])

~/src/gjoseph92/stackstac/stackstac/stack.py in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, reader)
    295     return xr.DataArray(
    296         arr,
--> 297         *to_coords(
    298             plain_items,
    299             asset_ids,

~/src/gjoseph92/stackstac/stackstac/prepare.py in to_coords(items, asset_ids, spec, xy_coords, properties, band_coords)
    347     dims = ["time", "band", "y", "x"]
    348     coords = {
--> 349         "time": pd.to_datetime(
    350             [item["properties"]["datetime"] for item in items],
    351             infer_datetime_format=True,

~/miniconda3/envs/stackstac/lib/python3.8/site-packages/pandas/core/indexes/extension.py in method(self, *args, **kwargs)
     76
     77         def method(self, *args, **kwargs):
---> 78             result = attr(self._data, *args, **kwargs)
     79             if wrap:
     80                 if isinstance(result, type(self._data)):

~/miniconda3/envs/stackstac/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in tz_convert(self, tz)
    800         if self.tz is None:
    801             # tz naive, use tz_localize
--> 802             raise TypeError(
    803                 "Cannot convert tz-naive timestamps, use tz_localize to localize"
    804             )

TypeError: Cannot convert tz-naive timestamps, use tz_localize to localize
ipdb> pp pd.to_datetime(items[0]['properties']['datetime'])
Timestamp('2020-09-08 18:55:51.575595+0000', tz='UTC')

Seems to be the issue with infer_datetime_format=True. Commenting that out gets us past it.

added a commit that references this issue on Apr 23, 2021
added a commit that references this issue on Apr 23, 2021
added a commit that references this issue on Apr 23, 2021
5a7cb9f
TomAugspurger

TomAugspurger commented on Apr 23, 2021

@TomAugspurger
ContributorAuthor

Thanks Gabe, confirmed that this fixed it!

gjoseph92

gjoseph92 commented on Apr 23, 2021

@gjoseph92
Owner

@TomAugspurger the conflicting sizes issue should be fixed by #35. I thought I'd fixed it in #25, but that still had some issues with floating-point error.

BTW, I think the off-by-one between stackstac's shape and the shape of the underlying dataset as reported by rasterio is actually partially due to an issue with the STAC metadata:

import planetary_computer as pc
import stackstac
import requests
import pystac
import rasterio
import affine

r = requests.get(
    "https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-8-c2-l2/items/LC08_L2SP_046027_20200908_20200919_02_T1"
)

item = pystac.Item.from_dict(r.json())
sitem = pc.sign_assets(item)

ds = rasterio.open(sitem.assets["SR_B2"].href)
stac_transform = Affine(*asset.properties["proj:transform"])
>>> ds.transform
Affine(30.0, 0.0, 472485.0,
       0.0, -30.0, 5373615.0)
>>> stac_transform
Affine(29.996139492986746, 0.0, 472500.0,
       0.0, -29.99619820048156, 5373600.0)

>>> ds.transform.xoff - stac_transform.xoff
-15.0
>>> ds.transform.yoff - stac_transform.yoff
15.0

It looks like in the STAC metadata, the asset is shifted half a pixel to the southeast. Additionally, the resolution reported by proj:transform is a tiny bit smaller than its actual resolution. Because stackstac prefers to take the resolution directly from the geotrans when possible, this slight offset is what's eventually leading to the shape mismatch.

The bounds reported in STAC metadata also don't match the bounds of the GeoTIFF:

>>> list(ds.bounds)
[472485.0, 5136885.0, 705615.0, 5373615.0]
>>> item.ext.projection.bbox
[472500.0, 5136900.0, 705600.0, 5373600.0]

>>> np.array(ds.bounds) - np.array(item.ext.projection.bbox)
array([-15., -15.,  15.,  15.])

It seems there's the half-pixel shift, plus in the STAC bounds, the asset is 30m (1px) smaller than in the GeoTIFF (which is probably why the resolution in the geotrans is slightly off):

>>> ds_shape_m = (ds.bounds[2] - ds.bounds[0], ds.bounds[3] - ds.bounds[1])
>>> stac_bounds = item.ext.projection.bbox
>>> stac_shape_m = (stac_bounds[2] - stac_bounds[0], stac_bounds[3] - stac_bounds[1])
>>> np.array(ds_shape_m) - np.array(stac_shape_m)
array([30., 30.])

So I think stackstac is actually coming up with the right shape given the STAC metadata, but that seems slightly offset from the underlying data.

TomAugspurger

TomAugspurger commented on Apr 23, 2021

@TomAugspurger
ContributorAuthor

Thanks for investigating that. @lossyrob, do you have any thoughts on #33 (comment)? I don't actually remember if these STAC items were produced by us / stactools, or if it came from USGS with the COGs.

lossyrob

lossyrob commented on May 4, 2021

@lossyrob

The shape is generated via the MTL data here and the affine transformation is calculated from the bbox here. Potentially there's a mismatch between the provided metadata via the MTL/ANG files and the raster footprints.

added
needs-future-testAdd a test for this in the future, once tests exist (#26)
and removed
needs-future-testAdd a test for this in the future, once tests exist (#26)
on Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-future-testAdd a test for this in the future, once tests exist (#26)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @TomAugspurger@lossyrob@gjoseph92

      Issue actions

        ValueError: conflicting sizes for dimension with single item / asset · Issue #33 · gjoseph92/stackstac