Skip to content

feature: artifact-helper #750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added docs/_ext/__init__.py
Empty file.
5 changes: 5 additions & 0 deletions docs/_ext/jsonlexer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from pygments.lexers import JsonLexer


def setup(app):
app.add_lexer("json", JsonLexer)
186 changes: 186 additions & 0 deletions docs/api-artifacts.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
.. _api-openeo-extra-artifacts:

====================================
API: openeo.extra.artifacts
====================================

.. warning::
This is a new experimental API, subject to change.

The artifacts functionality relies on extra Python packages. They can be installed using:

.. code-block:: shell

pip install "openeo[artifacts]" --upgrade


When running OpenEO jobs it is not uncommon to require artifacts that should be accessible during job execution. This
requires the artifacts to be accessible from within the OpenEO processing environment. :py:mod:`openeo.extra.artifacts` tries
to perform the heavy lifting for this use case by allowing staging artifacts to a secure but temporary location using 3
simple steps:

1. Connect to your OpenEO backend
2. Create an artifact helper from your OpenEO connection
3. Upload your file using the artifact helper and optionally get a presigned URI

So in code this looks like:

.. code-block:: python

import openeo
from openeo.extra.artifacts import ArtifactHelper

connection = openeo.connect("my-openeo.prod.example").authenticate_oidc()

artifact_helper = ArtifactHelper.from_openeo_connection(connection)
storage_uri = artifact_helper.upload_file(object_name, src_file_path)
presigned_uri = artifact_helper.get_presigned_url(storage_uri)

Note that the storage_uri can be used from regular execution steps of your OpenEO job. The presigned uri could be used
from environments where credentials are not injected (e.g. UDFs) as the presigned URL. A presigned URL has the
authentication details embedded so if your data is sensitive you must make sure to keep this URL secret.

User facing API
===============


.. autoclass:: openeo.extra.artifacts.artifact_helper.ArtifactHelper
:members:


.. autoclass:: openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC
:members: upload_file, get_presigned_url
:no-index:


How does it work ?
==================

1) :py:meth:`openeo.extra.artifacts.artifact_helper.ArtifactHelper.from_openeo_connection` is a factory method that
will create an artifact helper where the type is defined by the config type. The OpenEO connection object is used to
see if the OpenEO backend advertises a preferred config.
2) :py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.upload_file` and
:py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.get_presigned_url` do the heavy lifting to
store your artifact in provider managed storage and to return references that can be used. In case the backend uses
an Object storage that has an S3 API it will:

1. Get temporary S3 credentials based on config advertised by the backend and the session from your connection
2. Upload the file into object storage and return an S3 URI which the backend can resolve
3. Optional the :py:meth:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC.get_presigned_url` makes a
URI signed with the temporary credentials such that it works standalone (Some tools and execution steps do not
support handling of internal references. presigned URLs should work in any tool).


.. _artifacts-exceptions:


Documentation for backend providers
===================================

This section and its subsection is for engineers who operate an OpenEO backend. If you are a user of an OpenEO platform
this is unlikely to be of value to you.

Advertising capabilities from the backend
-----------------------------------------

It is expected that the backend advertises in its capabilities a section on artifacts. The following is an example
for the S3STSConfig (of the :py:mod:`openeo.extra.artifacts._s3sts` package).

.. code-block:: json

{
// ...
"artifacts": {
"providers": [
{
"config": { // The config block its keys can differ for other config types
"bucket": "openeo-artifacts", // The bucket where the artifacts will be stored
"role": "arn:aws:iam::000000000000:role/S3Access", // The role that will be assumed via STS
"s3_endpoint": "https://my.s3.test", // Where S3 API calls are sent
"sts_endpoint": "https://my.sts.test" // Where STS API calls are sent
},
"id": "s3", // This id is a logical name
"type": "S3STSConfig" // The config type of the ArtifactHelper
}
]
},
// ...
}


Extending support for other types of artifacts
----------------------------------------------

.. warning::
This is a section for developers of the `openeo-python-client` Python package. If you want to walk this road it is
best to create an issue on github and detail what support you are planning to add to get input on feasibility and
whether it will be mergeable early on.

Ideally the user-interface is simple and stable. Unfortunately implementations themselves come with more complexity.
This section explains what is needed to provide support for additional types of artifacts. Below the steps we show
the API that is involved.

1. Create another internal package for the implementation. The following steps should be done inside that package.
This package resides under :py:mod:`openeo.extra.artifacts`
2. Create a config implementation which extends :py:class:`openeo.extra.artifacts._config.ArtifactsStorageConfigABC`
and should be a frozen dataclass. This class implements the logic to determine the configuration used by the
implementation `_load_connection_provided_config(self, provider_config: ProviderConfig) -> None` is used for that.

When this method is called explicit config is already put in place and if not provided default config is put in
place.
Because frozen dataclasses are used for config `object.__setattr__(self, ...)` must be used to manipulate the
values.

So per attribute the same pattern is used. For example an attribute `foo` which has a default `bar` that can be kept
constant would be:

.. code-block:: python

if self.foo is None:
try:
object.__setattr__(self, "foo", provider_config["foo"])
except NoDefaultConfig:
object.__setattr__(self, "foo", "bar")

Here we use :py:exc:`openeo.extra.artifacts.exceptions.NoDefaultConfig`

3. Create an implementation of :py:class:`openeo.extra.artifacts._uri.StorageURI` to model the internal URIs to the
stored artifact
4. Create an ArtifactHelper implementation which extends :py:class:`openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC`
5. Add a key value pair to the :py:obj:`openeo.extra.artifacts.artifact_helper.config_to_helper` dictionary. The key is
the class created in 2 and the value is the class created in step 3

.. autoclass:: openeo.extra.artifacts._config.ArtifactsStorageConfigABC
:members:
:private-members: _load_connection_provided_config

.. autoclass:: openeo.extra.artifacts._artifact_helper_abc.ArtifactHelperABC
:members:
:private-members: _get_default_storage_config, _from_openeo_connection

.. autoclass:: openeo.extra.artifacts._uri.StorageURI
:members:


Artifacts exceptions
--------------------

When using artifacts your interactions can result in the following exceptions.

.. autoexception:: openeo.extra.artifacts.exceptions.ArtifactsException
:members:

.. autoexception:: openeo.extra.artifacts.exceptions.NoAdvertisedProviders
:members:

.. autoexception:: openeo.extra.artifacts.exceptions.UnsupportedArtifactsType
:members:

.. autoexception:: openeo.extra.artifacts.exceptions.NoDefaultConfig
:members:

.. autoexception:: openeo.extra.artifacts.exceptions.InvalidProviderConfig
:members:

.. autoexception:: openeo.extra.artifacts.exceptions.ProviderSpecificException
:members:
31 changes: 12 additions & 19 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,18 @@
# All configuration values have a default; values that are commented out
# serve to show the default.

import datetime

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import datetime

sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('../'))
sys.path.insert(0, os.path.abspath("_ext"))
sys.path.insert(0, os.path.abspath("."))
sys.path.insert(0, os.path.abspath("../"))

import openeo

Expand All @@ -36,15 +38,16 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx_autodoc_typehints',
'sphinx.ext.viewcode',
'sphinx.ext.doctest',
'myst_parser',
"sphinx.ext.intersphinx",
"sphinx.ext.autodoc",
"sphinx_autodoc_typehints",
"sphinx.ext.viewcode",
"sphinx.ext.doctest",
"myst_parser",
"jsonlexer",
]

import sphinx_autodoc_typehints

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand Down Expand Up @@ -195,13 +198,3 @@
author, 'openeo', 'One line description of project.',
'Miscellaneous'),
]


# Mapping for external documentation
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"xarray": ("https://docs.xarray.dev/en/stable/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"urllib3": ("https://urllib3.readthedocs.io/en/stable/", None),
}
14 changes: 7 additions & 7 deletions docs/cookbook/sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ but rather want to extract a result at specific locations.
Examples include extracting training data for model calibration, or computing the result for
areas where validation data is available.

An important constraint is that most implementations assume that sampling is an operation
on relatively small areas, of for instance up to 512x512 pixels (but often much smaller).
An important constraint is that most implementations assume that sampling is an operation
on relatively small areas, of for instance up to 512x512 pixels (but often much smaller).
When extracting polygons with larger areas, it is recommended to look into running a separate job per 'sample'.
Some more important performance notices are mentioned later in the chapter, please read them carefully
Some more important performance notices are mentioned later in the chapter, please read them carefully
to get best results.

Sampling can be done for points or polygons:
Expand All @@ -23,12 +23,12 @@ public url, and to load it in openEO using {py:meth}`openeo.rest.connection.Conn

## Sampling at point locations

To sample point locations, the `openeo.rest.datacube.DataCube.aggregate_spatial` method can be used. The reducer can be a
To sample point locations, the `openeo.rest.datacube.DataCube.aggregate_spatial` method can be used. The reducer can be a
commonly supported reducer like `min`, `max` or `mean` and will receive only one value as input in most cases. Note that
in edge cases, a point can intersect with up to 4 pixels. If this is not desirable, it might be worth trying to align
in edge cases, a point can intersect with up to 4 pixels. If this is not desirable, it might be worth trying to align
points with pixel centers, which does require more advanced knowledge of the pixel grid of your data cube.

More information on `aggregate_spatial` is available [here](_aggregate-spatial-evi).
More information on `aggregate_spatial` is available [here](aggregate-spatial-evi).

## Sampling polygons as rasters

Expand Down Expand Up @@ -76,4 +76,4 @@ batch job. The recommendation here is to apply a spatial grouping to your sampli
an area of around 100x100km. The optimal size of a group may be backend dependant. Also remember that when working with
data in the UTM projection, you may want to avoid covering multiple UTM zones in a single group.

See also how to manage [multiple jobs](_job-manager).
See also how to manage [multiple jobs](job-manager).
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Table of contents
cookbook/index
api
api-processes
api-artifacts
process_mapping
development
best_practices
Expand Down
1 change: 1 addition & 0 deletions openeo/extra/artifacts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from openeo.extra.artifacts.artifact_helper import ArtifactHelper
Loading