Skip to content

Drop read_vector #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- `MultiBackendJobManager`: costs has been added as a column in tracking databases ([[#588](https://github.com/Open-EO/openeo-python-client/issues/588)])
- When passing a path/string as `geometry` to `DataCube.aggregate_spatial()`, `DataCube.mask_polygon()`, etc.:
this is not translated automatically anymore to deprecated, non-standard `read_vector` usage.
Instead, if it is a local GeoJSON file, the GeoJSON data will be loaded directly client-side.
([#104](https://github.com/Open-EO/openeo-python-client/issues/104), [#457](https://github.com/Open-EO/openeo-python-client/issues/457))

### Removed

Expand Down
48 changes: 48 additions & 0 deletions docs/cookbook/tricks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,51 @@ For example:

# `create_job` with URL to JSON file
job = connection.create_job("https://jsonbin.example/my/process-graph.json")


.. _legacy_read_vector:


Legacy ``read_vector`` usage
----------------------------

In versions up to 0.35.0 of the openEO Python client library,
there was an old, deprecated feature in geometry handling
of :py:class:`~openeo.rest.datacube.DataCube` methods like
:py:meth:`~openeo.rest.datacube.DataCube.aggregate_spatial()` and
:py:meth:`~openeo.rest.datacube.DataCube.mask_polygon()`
where you could pass a *backend-side* path as ``geometries``, e.g.:

.. code-block:: python

cube = cube.aggregate_spatial(
geometries="/backend/path/to/geometries.json",
reducer="mean"
)

The client would handle this by automatically adding a ``read_vector`` process
in the process graph, with that path as argument, to instruct the backend to load the geometries from there.
This ``read_vector`` process was however a backend-specific, experimental and now deprecated process.
Moreover, it assumes that the user has access to (or at least knowledge of) the backend's file system,
which violates the openEO principle of abstracting away backend-specific details.

In version 0.36.0, this old deprecated ``read_vector`` feature has been *removed*,
to allow other and better convenience functionality
when providing a string in the ``geometries`` argument:
e.g. load from a URL with standard process ``load_url``,
or load GeoJSON from a local clientside path.

If your workflow however depends on the old, deprecated ``read_vector`` functionality,
it is possible to reconstruct that by manually adding a ``read_vector`` process in your workflow,
for example as follows:

.. code-block:: python

from openeo.processes import process

cube = cube.aggregate_spatial(
geometries=process("read_vector", filename="/backend/path/to/geometries.json"),
reducer="mean"
)

Note that this is also works with older versions of the openEO Python client library.
78 changes: 54 additions & 24 deletions openeo/rest/datacube.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
from openeo.rest.service import Service
from openeo.rest.udp import RESTUserDefinedProcess
from openeo.rest.vectorcube import VectorCube
from openeo.util import dict_no_none, guess_format, normalize_crs, rfc3339
from openeo.util import dict_no_none, guess_format, load_json, normalize_crs, rfc3339

if typing.TYPE_CHECKING:
# Imports for type checking only (circular import issue at runtime).
Expand Down Expand Up @@ -609,7 +609,8 @@ def filter_spatial(
(also see :py:func:`Connection.list_file_formats() <openeo.rest.connection.Connection.list_file_formats>`),
e.g. GeoJSON, GeoParquet, etc.
A ``load_url`` process will automatically be added to the process graph.
- a path (that is valid for the back-end) to a GeoJSON file.
- a path (:py:class:`str` or :py:class:`~pathlib.Path`) to a local, client-side GeoJSON file,
which will be loaded automatically to get the geometries as GeoJSON construct.
- a :py:class:`~openeo.rest.vectorcube.VectorCube` instance.
- a :py:class:`~openeo.api.process.Parameter` instance.

Expand All @@ -619,6 +620,12 @@ def filter_spatial(

.. versionchanged:: 0.36.0
Support passing a URL as ``geometries`` argument, which will be loaded with the ``load_url`` process.

.. versionchanged:: 0.36.0
Support for passing a backend-side path as ``geometries`` argument was removed
(also see :ref:`legacy_read_vector`).
Instead, it's possible to provide a client-side path to a GeoJSON file
(which will be loaded client-side to get the geometries as GeoJSON construct).
"""
valid_geojson_types = [
"Point", "MultiPoint", "LineString", "MultiLineString",
Expand Down Expand Up @@ -1053,7 +1060,7 @@ def _merge_operator_binary_cubes(

def _get_geometry_argument(
self,
geometry: Union[
argument: Union[
shapely.geometry.base.BaseGeometry,
dict,
str,
Expand All @@ -1065,19 +1072,19 @@ def _get_geometry_argument(
crs: Optional[str] = None,
) -> Union[dict, Parameter, PGNode]:
"""
Convert input to a geometry as "geojson" subtype object.
Convert input to a geometry as "geojson" subtype object or vectorcube.

:param crs: value that encodes a coordinate reference system.
See :py:func:`openeo.util.normalize_crs` for more details about additional normalization that is applied to this argument.
"""
if isinstance(geometry, Parameter):
return geometry
elif isinstance(geometry, _FromNodeMixin):
return geometry.from_node()
if isinstance(argument, Parameter):
return argument
elif isinstance(argument, _FromNodeMixin):
return argument.from_node()

if isinstance(geometry, str) and re.match(r"^https?://", geometry, flags=re.I):
if isinstance(argument, str) and re.match(r"^https?://", argument, flags=re.I):
# Geometry provided as URL: load with `load_url` (with best-effort format guess)
url = urllib.parse.urlparse(geometry)
url = urllib.parse.urlparse(argument)
suffix = pathlib.Path(url.path.lower()).suffix
format = {
".json": "GeoJSON",
Expand All @@ -1086,18 +1093,20 @@ def _get_geometry_argument(
".parquet": "Parquet",
".geoparquet": "Parquet",
}.get(suffix, suffix.split(".")[-1])
return self.connection.load_url(url=geometry, format=format)

if isinstance(geometry, (str, pathlib.Path)):
# Assumption: `geometry` is path to polygon is a path to vector file at backend.
# TODO #104: `read_vector` is non-standard process.
# TODO: If path exists client side: load it client side?
return PGNode(process_id="read_vector", arguments={"filename": str(geometry)})
return self.connection.load_url(url=argument, format=format)

if isinstance(geometry, shapely.geometry.base.BaseGeometry):
geometry = mapping(geometry)
if not isinstance(geometry, dict):
raise OpenEoClientException("Invalid geometry argument: {g!r}".format(g=geometry))
if (
isinstance(argument, (str, pathlib.Path))
and pathlib.Path(argument).is_file()
and pathlib.Path(argument).suffix.lower() in [".json", ".geojson"]
):
geometry = load_json(argument)
elif isinstance(argument, shapely.geometry.base.BaseGeometry):
geometry = mapping(argument)
elif isinstance(argument, dict):
geometry = argument
else:
raise OpenEoClientException(f"Invalid geometry argument: {argument!r}")

if geometry.get("type") not in valid_geojson_types:
raise OpenEoClientException("Invalid geometry type {t!r}, must be one of {s}".format(
Expand Down Expand Up @@ -1147,7 +1156,8 @@ def aggregate_spatial(
(also see :py:func:`Connection.list_file_formats() <openeo.rest.connection.Connection.list_file_formats>`),
e.g. GeoJSON, GeoParquet, etc.
A ``load_url`` process will automatically be added to the process graph.
- a path (that is valid for the back-end) to a GeoJSON file.
- a path (:py:class:`str` or :py:class:`~pathlib.Path`) to a local, client-side GeoJSON file,
which will be loaded automatically to get the geometries as GeoJSON construct.
- a :py:class:`~openeo.rest.vectorcube.VectorCube` instance.
- a :py:class:`~openeo.api.process.Parameter` instance.

Expand Down Expand Up @@ -1177,6 +1187,12 @@ def aggregate_spatial(

.. versionchanged:: 0.36.0
Support passing a URL as ``geometries`` argument, which will be loaded with the ``load_url`` process.

.. versionchanged:: 0.36.0
Support for passing a backend-side path as ``geometries`` argument was removed
(also see :ref:`legacy_read_vector`).
Instead, it's possible to provide a client-side path to a GeoJSON file
(which will be loaded client-side to get the geometries as GeoJSON construct).
"""
valid_geojson_types = [
"Point", "MultiPoint", "LineString", "MultiLineString",
Expand Down Expand Up @@ -1502,7 +1518,8 @@ def apply_polygon(
(also see :py:func:`Connection.list_file_formats() <openeo.rest.connection.Connection.list_file_formats>`),
e.g. GeoJSON, GeoParquet, etc.
A ``load_url`` process will automatically be added to the process graph.
- a path (that is valid for the back-end) to a GeoJSON file.
- a path (:py:class:`str` or :py:class:`~pathlib.Path`) to a local, client-side GeoJSON file,
which will be loaded automatically to get the geometries as GeoJSON construct.
- a :py:class:`~openeo.rest.vectorcube.VectorCube` instance.
- a :py:class:`~openeo.api.process.Parameter` instance.

Expand All @@ -1519,6 +1536,12 @@ def apply_polygon(

.. versionchanged:: 0.36.0
Support passing a URL as ``geometries`` argument, which will be loaded with the ``load_url`` process.

.. versionchanged:: 0.36.0
Support for passing a backend-side path as ``geometries`` argument was removed
(also see :ref:`legacy_read_vector`).
Instead, it's possible to provide a client-side path to a GeoJSON file
(which will be loaded client-side to get the geometries as GeoJSON construct).
"""
# TODO drop support for legacy `polygons` argument:
# remove `kwargs, remove default `None` value for `geometries` and `process`
Expand Down Expand Up @@ -2011,7 +2034,8 @@ def mask_polygon(
(also see :py:func:`Connection.list_file_formats() <openeo.rest.connection.Connection.list_file_formats>`),
e.g. GeoJSON, GeoParquet, etc.
A ``load_url`` process will automatically be added to the process graph.
- a path (that is valid for the back-end) to a GeoJSON file.
- a path (:py:class:`str` or :py:class:`~pathlib.Path`) to a local, client-side GeoJSON file,
which will be loaded automatically to get the geometries as GeoJSON construct.
- a :py:class:`~openeo.rest.vectorcube.VectorCube` instance.
- a :py:class:`~openeo.api.process.Parameter` instance.

Expand All @@ -2024,6 +2048,12 @@ def mask_polygon(

.. versionchanged:: 0.36.0
Support passing a URL as ``geometries`` argument, which will be loaded with the ``load_url`` process.

.. versionchanged:: 0.36.0
Support for passing a backend-side path as ``geometries`` argument was removed
(also see :ref:`legacy_read_vector`).
Instead, it's possible to provide a client-side path to a GeoJSON file
(which will be loaded client-side to get the geometries as GeoJSON construct).
"""
valid_geojson_types = ["Polygon", "MultiPolygon", "GeometryCollection", "Feature", "FeatureCollection"]
mask = self._get_geometry_argument(mask, valid_geojson_types=valid_geojson_types, crs=srs)
Expand Down
10 changes: 5 additions & 5 deletions tests/data/1.0.0/aggregate_zonal_path.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@
}
}
},
"readvector1": {
"process_id": "read_vector",
"loadurl1": {
"process_id": "load_url",
"arguments": {
"filename": "/some/path/to/GeometryCollection.geojson"
}
"url": "https://example.com/geometries.geojson",
"format": "GeoJSON"}
},
"aggregatespatial1": {
"process_id": "aggregate_spatial",
Expand All @@ -34,7 +34,7 @@
"from_node": "filterbbox1"
},
"geometries": {
"from_node": "readvector1"
"from_node": "loadurl1"
},
"reducer": {
"process_graph": {
Expand Down
23 changes: 23 additions & 0 deletions tests/data/geojson/polygon02.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"type": "Polygon",
"coordinates": [
[
[
3,
50
],
[
4,
50
],
[
4,
51
],
[
3,
50
]
]
]
}
Loading