Closed
Description
I haven't looked too closely at this, but I'm wondering if there's some issue with the resampling / resolution handling? In this example I have a single STAC item and I select a single asset from it. stackstac (actually xarray) complains that the sizes of the data and coordinates don't match. (this example requires pip install planetary-computer
and pystac, but doesn't need an API token).
import planetary_computer as pc
import stackstac
import requests
import pystac
r = requests.get(
"https://planetarycomputer.microsoft.com/api/stac/v1/collections/landsat-8-c2-l2/items/LC08_L2SP_046027_20200908_20200919_02_T1"
)
item = pystac.Item.from_dict(r.json())
sitem = pc.sign_assets(item)
stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-b99acad73e1d> in <module>
13 sitem = pc.sign_assets(item)
14
---> 15 stackstac.stack([sitem.to_dict()], assets=["SR_B2"])
/srv/conda/envs/notebook/lib/python3.8/site-packages/stackstac/stack.py in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, reader)
290 )
291
--> 292 return xr.DataArray(
293 arr,
294 *to_coords(
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, indexes, fastpath)
407 data = _check_data_shape(data, coords, dims)
408 data = as_compatible_data(data)
--> 409 coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
410 variable = Variable(dims, data, attrs, fastpath=True)
411 indexes = dict(
/srv/conda/envs/notebook/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
155 for d, s in zip(v.dims, v.shape):
156 if s != sizes[d]:
--> 157 raise ValueError(
158 "conflicting sizes for dimension %r: "
159 "length %s on the data but length %s on "
ValueError: conflicting sizes for dimension 'x': length 7772 on the data but length 7773 on coordinate 'x'
The correct shape, according to rasterio, is
>>> ds = rasterio.open(sitem.assets["SR_B2"].href)
>>> ds.shape
(7891, 7771)
This could easily be user error :)
Activity
TomAugspurger commentedon Apr 23, 2021
Whoops, I just tried on
main
and see a different error, related to the recent timezone changes.Seems to be the issue with
infer_datetime_format=True
. Commenting that out gets us past it.Handle infer_datetime_format bug
Handle infer_datetime_format bug (#34)
Fix xy_coords size mismatch (#35)
TomAugspurger commentedon Apr 23, 2021
Thanks Gabe, confirmed that this fixed it!
gjoseph92 commentedon Apr 23, 2021
@TomAugspurger the conflicting sizes issue should be fixed by #35. I thought I'd fixed it in #25, but that still had some issues with floating-point error.
BTW, I think the off-by-one between stackstac's shape and the shape of the underlying dataset as reported by rasterio is actually partially due to an issue with the STAC metadata:
It looks like in the STAC metadata, the asset is shifted half a pixel to the southeast. Additionally, the resolution reported by
proj:transform
is a tiny bit smaller than its actual resolution. Because stackstac prefers to take the resolution directly from the geotrans when possible, this slight offset is what's eventually leading to the shape mismatch.The bounds reported in STAC metadata also don't match the bounds of the GeoTIFF:
It seems there's the half-pixel shift, plus in the STAC bounds, the asset is 30m (1px) smaller than in the GeoTIFF (which is probably why the resolution in the geotrans is slightly off):
So I think stackstac is actually coming up with the right shape given the STAC metadata, but that seems slightly offset from the underlying data.
TomAugspurger commentedon Apr 23, 2021
Thanks for investigating that. @lossyrob, do you have any thoughts on #33 (comment)? I don't actually remember if these STAC items were produced by us / stactools, or if it came from USGS with the COGs.
lossyrob commentedon May 4, 2021
The shape is generated via the MTL data here and the affine transformation is calculated from the bbox here. Potentially there's a mismatch between the provided metadata via the MTL/ANG files and the raster footprints.