Skip to content

Commit 8d1d947

Browse files
committed
Merge branch 'add_load_stac'
2 parents 4971005 + 081520f commit 8d1d947

File tree

5 files changed

+247
-20
lines changed

5 files changed

+247
-20
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010

1111
- Add support in `VectoCube.download()` to guess output format from extension of a given filename
1212
([#401](https://github.com/Open-EO/openeo-python-client/issues/401), [#449](https://github.com/Open-EO/openeo-python-client/issues/449))
13+
- Added `load_stac` for Client Side Processing, based on the [openeo-processes-dask implementation](https://github.com/Open-EO/openeo-processes-dask/pull/127)
1314

1415
### Changed
1516

17+
- Updated docs for Client Side Processing with `load_stac` examples, available at https://open-eo.github.io/openeo-python-client/cookbook/localprocessing.html
18+
1619
### Removed
1720

1821
### Fixed

docs/cookbook/localprocessing.rst

Lines changed: 89 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ Background
99
----------
1010

1111
The client-side processing functionality allows to test and use openEO with its processes locally, i.e. without any connection to an openEO back-end.
12-
It relies on the projects `openeo-pg-parser-networkx <https://github.com/Open-EO/openeo-pg-parser-networkx>`_, which provides an openEO process graph parsing tool, and `openeo-processes-dask <https://github.com/Open-EO/openeo-processes-dask>`_, which provides an Xarray and Dask implementation of most openEO processes.
12+
It relies on the projects `openeo-pg-parser-networkx <https://github.com/Open-EO/openeo-pg-parser-networkx>`_, which provides an openEO process graph parsing tool, and `openeo-processes-dask <https://github.com/Open-EO/openeo-processes-dask>`_, which provides an Xarray and Dask implementation of most openEO processes.
1313

1414
Installation
1515
------------
1616

1717
.. note::
18-
This feature requires ``Python>=3.9`` and has been tested
19-
with ``openeo-pg-parser-networkx==2023.3.1`` and
20-
``openeo-processes-dask==2023.3.2``
18+
This feature requires ``Python>=3.9``.
19+
Tested with ``openeo-pg-parser-networkx==2023.5.1`` and
20+
``openeo-processes-dask==2023.7.1``.
2121

2222
.. code:: bash
2323
@@ -26,18 +26,69 @@ Installation
2626
Usage
2727
-----
2828

29+
Every openEO process graph relies on data which is typically provided by a cloud infrastructure (the openEO back-end).
30+
The client-side processing adds the possibility to read and use local netCDFs, geoTIFFs, ZARR files, and remote STAC Collections or Items for your experiments.
31+
32+
STAC Collections and Items
33+
~~~~~~~~~~~~~~~~~~~~~~~~~~
34+
35+
.. warning::
36+
The provided examples using STAC rely on third party STAC Catalogs, we can't guarantee that the urls will remain valid.
37+
38+
With the ``load_stac`` process it's possible to load and use data provided by remote or local STAC Collections or Items.
39+
The following code snippet loads Sentinel-2 L2A data from a public STAC Catalog, using specific spatial and temporal extent, band name and also properties for cloud coverage.
40+
41+
.. code-block:: pycon
42+
43+
>>> from openeo.local import LocalConnection
44+
>>> local_conn = LocalConnection("./")
45+
46+
>>> url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
47+
>>> spatial_extent = {"west": 11, "east": 12, "south": 46, "north": 47}
48+
>>> temporal_extent = ["2019-01-01", "2019-06-15"]
49+
>>> bands = ["red"]
50+
>>> properties = {"eo:cloud_cover": dict(lt=50)}
51+
>>> s2_cube = local_conn.load_stac(url=url,
52+
... spatial_extent=spatial_extent,
53+
... temporal_extent=temporal_extent,
54+
... bands=bands,
55+
... properties=properties,
56+
... )
57+
>>> s2_cube.execute()
58+
<xarray.DataArray 'stackstac-08730b1b5458a4ed34edeee60ac79254' (time: 177,
59+
band: 1,
60+
y: 11354,
61+
x: 8025)>
62+
dask.array<getitem, shape=(177, 1, 11354, 8025), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
63+
Coordinates: (12/53)
64+
* time (time) datetime64[ns] 2019-01-02...
65+
id (time) <U24 'S2B_32TPR_20190102_...
66+
* band (band) <U3 'red'
67+
* x (x) float64 6.52e+05 ... 7.323e+05
68+
* y (y) float64 5.21e+06 ... 5.096e+06
69+
s2:product_uri (time) <U65 'S2B_MSIL2A_20190102...
70+
... ...
71+
raster:bands object {'nodata': 0, 'data_type'...
72+
gsd int32 10
73+
common_name <U3 'red'
74+
center_wavelength float64 0.665
75+
full_width_half_max float64 0.038
76+
epsg int32 32632
77+
Attributes:
78+
spec: RasterSpec(epsg=32632, bounds=(600000.0, 4990200.0, 809760.0...
79+
crs: epsg:32632
80+
transform: | 10.00, 0.00, 600000.00|\n| 0.00,-10.00, 5300040.00|\n| 0.0...
81+
resolution: 10.0
82+
2983
Local Collections
3084
~~~~~~~~~~~~~~~~~
3185

32-
Every openEO process graph relies on data, which was always provided by a cloud infrastructure (the openEO back-end) until now.
33-
The client-side processing adds the possibility to read and use local netCDFs, geoTIFFs and ZARR files for your experiments.
34-
3586
If you want to use our sample data, please clone this repository:
3687

3788
.. code:: bash
3889
3990
git clone https://github.com/Open-EO/openeo-localprocessing-data.git
40-
91+
4192
With some sample data we can now check the STAC metadata for the local files by doing:
4293

4394
.. code:: python
@@ -80,9 +131,8 @@ Let's start with the provided sample netCDF of Sentinel-2 data:
80131
Attributes:
81132
Conventions: CF-1.9
82133
institution: openEO platform - Geotrellis backend: 0.9.5a1
83-
description:
84-
title:
85-
...
134+
description:
135+
title:
86136
87137
As you can see in the previous example, we are using a call to execute() which will execute locally the generated openEO process graph.
88138
In this case, the process graph consist only in a single load_collection, which performs lazy loading of the data. With this first step you can check if the data is being read correctly by openEO.
@@ -96,9 +146,35 @@ We can now do a simple processing for demo purposes, let's compute the median ND
96146
97147
b04 = s2_datacube.band("B04")
98148
b08 = s2_datacube.band("B08")
99-
ndvi = (b08-b04)/(b08+b04)
100-
ndvi_median = ndvi.reduce_dimension(dimension="t",reducer="median")
149+
ndvi = (b08 - b04) / (b08 + b04)
150+
ndvi_median = ndvi.reduce_dimension(dimension="t", reducer="median")
101151
result_ndvi = ndvi_median.execute()
102152
result_ndvi.plot.imshow(cmap="Greens")
103153
104154
.. image:: ../_static/images/local/local_ndvi.jpg
155+
156+
We can perform the same example using data provided by STAC Collection:
157+
158+
.. code:: python
159+
160+
from openeo.local import LocalConnection
161+
local_conn = LocalConnection("./")
162+
163+
url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
164+
spatial_extent = {"east": 11.40, "north": 46.52, "south": 46.46, "west": 11.25}
165+
temporal_extent = ["2022-06-01", "2022-06-30"]
166+
bands = ["red", "nir"]
167+
properties = {"eo:cloud_cover": dict(lt=80)}
168+
s2_datacube = local_conn.load_stac(
169+
url=url,
170+
spatial_extent=spatial_extent,
171+
temporal_extent=temporal_extent,
172+
bands=bands,
173+
properties=properties,
174+
)
175+
176+
b04 = s2_datacube.band("red")
177+
b08 = s2_datacube.band("nir")
178+
ndvi = (b08 - b04) / (b08 + b04)
179+
ndvi_median = ndvi.reduce_dimension(dimension="time", reducer="median")
180+
result_ndvi = ndvi_median.execute()

openeo/local/connection.py

Lines changed: 154 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,17 @@
33
from pathlib import Path
44
from typing import Callable, Dict, List, Optional, Union
55

6+
import numpy as np
67
import xarray as xr
78
from openeo_pg_parser_networkx.graph import OpenEOProcessGraph
9+
from openeo_pg_parser_networkx.pg_schema import BoundingBox, TemporalInterval
10+
from openeo_processes_dask.process_implementations.cubes import load_stac
811

912
from openeo.internal.graph_building import PGNode, as_flat_graph
1013
from openeo.internal.jupyter import VisualDict, VisualList
1114
from openeo.local.collections import _get_geotiff_metadata, _get_local_collections, _get_netcdf_zarr_metadata
1215
from openeo.local.processing import PROCESS_REGISTRY
13-
from openeo.metadata import CollectionMetadata
16+
from openeo.metadata import Band, BandDimension, CollectionMetadata, SpatialDimension, TemporalDimension
1417
from openeo.rest.datacube import DataCube
1518

1619
_log = logging.getLogger(__name__)
@@ -88,6 +91,156 @@ def load_collection(
8891
fetch_metadata=fetch_metadata,
8992
)
9093

94+
def datacube_from_process(self, process_id: str, namespace: Optional[str] = None, **kwargs) -> DataCube:
95+
"""
96+
Load a data cube from a (custom) process.
97+
98+
:param process_id: The process id.
99+
:param namespace: optional: process namespace
100+
:param kwargs: The arguments of the custom process
101+
:return: A :py:class:`DataCube`, without valid metadata, as the client is not aware of this custom process.
102+
"""
103+
graph = PGNode(process_id, namespace=namespace, arguments=kwargs)
104+
return DataCube(graph=graph, connection=self)
105+
106+
def load_stac(
107+
self,
108+
url: str,
109+
spatial_extent: Optional[Dict[str, float]] = None,
110+
temporal_extent: Optional[List[Union[str, datetime.datetime, datetime.date]]] = None,
111+
bands: Optional[List[str]] = None,
112+
properties: Optional[dict] = None,
113+
) -> DataCube:
114+
"""
115+
Loads data from a static STAC catalog or a STAC API Collection and returns the data as a processable :py:class:`DataCube`.
116+
A batch job result can be loaded by providing a reference to it.
117+
118+
If supported by the underlying metadata and file format, the data that is added to the data cube can be
119+
restricted with the parameters ``spatial_extent``, ``temporal_extent`` and ``bands``.
120+
If no data is available for the given extents, a ``NoDataAvailable`` error is thrown.
121+
122+
Remarks:
123+
124+
* The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as
125+
specified in the metadata if the ``bands`` parameter is set to ``null``.
126+
* If no additional parameter is specified this would imply that the whole data set is expected to be loaded.
127+
Due to the large size of many data sets, this is not recommended and may be optimized by back-ends to only
128+
load the data that is actually required after evaluating subsequent processes such as filters.
129+
This means that the values should be processed only after the data has been limited to the required extent
130+
and as a consequence also to a manageable size.
131+
132+
133+
:param url: The URL to a static STAC catalog (STAC Item, STAC Collection, or STAC Catalog)
134+
or a specific STAC API Collection that allows to filter items and to download assets.
135+
This includes batch job results, which itself are compliant to STAC.
136+
For external URLs, authentication details such as API keys or tokens may need to be included in the URL.
137+
138+
Batch job results can be specified in two ways:
139+
140+
- For Batch job results at the same back-end, a URL pointing to the corresponding batch job results
141+
endpoint should be provided. The URL usually ends with ``/jobs/{id}/results`` and ``{id}``
142+
is the corresponding batch job ID.
143+
- For external results, a signed URL must be provided. Not all back-ends support signed URLs,
144+
which are provided as a link with the link relation `canonical` in the batch job result metadata.
145+
:param spatial_extent:
146+
Limits the data to load to the specified bounding box or polygons.
147+
148+
For raster data, the process loads the pixel into the data cube if the point at the pixel center intersects
149+
with the bounding box or any of the polygons (as defined in the Simple Features standard by the OGC).
150+
151+
For vector data, the process loads the geometry into the data cube if the geometry is fully within the
152+
bounding box or any of the polygons (as defined in the Simple Features standard by the OGC).
153+
Empty geometries may only be in the data cube if no spatial extent has been provided.
154+
155+
The GeoJSON can be one of the following feature types:
156+
157+
* A ``Polygon`` or ``MultiPolygon`` geometry,
158+
* a ``Feature`` with a ``Polygon`` or ``MultiPolygon`` geometry, or
159+
* a ``FeatureCollection`` containing at least one ``Feature`` with ``Polygon`` or ``MultiPolygon`` geometries.
160+
161+
Set this parameter to ``None`` to set no limit for the spatial extent.
162+
Be careful with this when loading large datasets. It is recommended to use this parameter instead of
163+
using ``filter_bbox()`` or ``filter_spatial()`` directly after loading unbounded data.
164+
165+
:param temporal_extent:
166+
Limits the data to load to the specified left-closed temporal interval.
167+
Applies to all temporal dimensions.
168+
The interval has to be specified as an array with exactly two elements:
169+
170+
1. The first element is the start of the temporal interval.
171+
The specified instance in time is **included** in the interval.
172+
2. The second element is the end of the temporal interval.
173+
The specified instance in time is **excluded** from the interval.
174+
175+
The second element must always be greater/later than the first element.
176+
Otherwise, a `TemporalExtentEmpty` exception is thrown.
177+
178+
Also supports open intervals by setting one of the boundaries to ``None``, but never both.
179+
180+
Set this parameter to ``None`` to set no limit for the temporal extent.
181+
Be careful with this when loading large datasets. It is recommended to use this parameter instead of
182+
using ``filter_temporal()`` directly after loading unbounded data.
183+
184+
:param bands:
185+
Only adds the specified bands into the data cube so that bands that don't match the list
186+
of band names are not available. Applies to all dimensions of type `bands`.
187+
188+
Either the unique band name (metadata field ``name`` in bands) or one of the common band names
189+
(metadata field ``common_name`` in bands) can be specified.
190+
If the unique band name and the common name conflict, the unique band name has a higher priority.
191+
192+
The order of the specified array defines the order of the bands in the data cube.
193+
If multiple bands match a common name, all matched bands are included in the original order.
194+
195+
It is recommended to use this parameter instead of using ``filter_bands()`` directly after loading unbounded data.
196+
197+
:param properties:
198+
Limits the data by metadata properties to include only data in the data cube which
199+
all given conditions return ``True`` for (AND operation).
200+
201+
Specify key-value-pairs with the key being the name of the metadata property,
202+
which can be retrieved with the openEO Data Discovery for Collections.
203+
The value must be a condition (user-defined process) to be evaluated against a STAC API.
204+
This parameter is not supported for static STAC.
205+
206+
.. versionadded:: 0.21.0
207+
"""
208+
arguments = {"url": url}
209+
# TODO: more normalization/validation of extent/band parameters and `properties`
210+
if spatial_extent:
211+
arguments["spatial_extent"] = spatial_extent
212+
if temporal_extent:
213+
arguments["temporal_extent"] = DataCube._get_temporal_extent(temporal_extent)
214+
if bands:
215+
arguments["bands"] = bands
216+
if properties:
217+
arguments["properties"] = properties
218+
cube = self.datacube_from_process(process_id="load_stac", **arguments)
219+
# detect actual metadata from URL
220+
# run load_stac to get the datacube metadata
221+
arguments["spatial_extent"] = BoundingBox.parse_obj(spatial_extent)
222+
arguments["temporal_extent"] = TemporalInterval.parse_obj(temporal_extent)
223+
xarray_cube = load_stac(**arguments)
224+
attrs = xarray_cube.attrs
225+
for at in attrs:
226+
# allowed types: str, Number, ndarray, number, list, tuple
227+
if not isinstance(attrs[at], (int, float, str, np.ndarray, list, tuple)):
228+
attrs[at] = str(attrs[at])
229+
metadata = CollectionMetadata(
230+
attrs,
231+
dimensions=[
232+
SpatialDimension(name=xarray_cube.openeo.x_dim, extent=[]),
233+
SpatialDimension(name=xarray_cube.openeo.y_dim, extent=[]),
234+
TemporalDimension(name=xarray_cube.openeo.temporal_dims[0], extent=[]),
235+
BandDimension(
236+
name=xarray_cube.openeo.band_dims[0],
237+
bands=[Band(x) for x in xarray_cube[xarray_cube.openeo.band_dims[0]].values],
238+
),
239+
],
240+
)
241+
cube.metadata = metadata
242+
return cube
243+
91244
def execute(self, process_graph: Union[dict, str, Path]) -> xr.DataArray:
92245
"""
93246
Execute locally the process graph and return the result as an xarray.DataArray.

requirements-localprocessing.txt

Lines changed: 0 additions & 5 deletions
This file was deleted.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
"rioxarray>=0.13.0",
4141
"pyproj",
4242
"openeo_pg_parser_networkx>=2023.5.1",
43-
"openeo_processes_dask[implementations]>=2023.5.1",
43+
"openeo_processes_dask[implementations]>=2023.7.1",
4444
]
4545

4646
jupyter_require = [

0 commit comments

Comments
 (0)