Skip to content

Commit 162a1a2

Browse files
mmcfarlandlossyrobUbuntu
authored
PgSTAC: API hydration of search result items (#397)
* Upgrade to pgstac 0.5.1 Initial changes to get most tests passing. * Add option to hydrate pgstac search results in API * Support fields extension in nohydrate mode * Updates to hydrate and filter functionality. This was done in a pairing session with @mmcfarland * Fix fields extensions and reduce number of loops * Tolerate missing required attributes with fields extension Use of the fields extension can result in the return of invalid stac items if excludes is used on required attributes. When injecting item links, don't attempt to build links for which needed attributes aren't available. When API Hydrate is enabled, the required attributes are preserved prior to filtering and are used in the link generation. * Run pgstac tests in db and api hydrate mode * Merge dicts within lists during hydration In practice, an asset on a base_item and an item may have mergable dicts (ie, raster bands). * Add note on settings in readme * Pass request to base_item_cache This will be used by implementors who need app state which is stored on request. * Upgrade pypgstac and use included hydrate function The hydrate function was improved and moved to pypgstac so it could be used in other projects outside of stac-fastapi. It was developed with a corresponding dehydrate function to ensure parity between the two. The version of pypgstac is unpublished and pinned to a draft commit at the point and will be upgraded subsequently. * Improve fields extension implementation Correctly supports deeply nested property keys in both include and exclude, as well as improves variable naming, comments, and test cases. * Remove unused error type * adjust tests for changes in api * remove print statements * add bbox back to items in tests * Upgrade pgstac * Fix conformance test fixtures * Fix sqlalchemy test with new status for FK error * Align fields ext behavior for invalid includes * Lint * Changelog * Remove psycopg install dependency * Relax dependency version of pgstac to 0.6.* series * Update dev environment to pgstac 0.6.2 * Changelog fix Co-authored-by: Rob Emanuele <[email protected]> Co-authored-by: Ubuntu <planetarycomputer@pct-bitner-vm.kko0dpzi4g3udak2ovyb5nsdte.ax.internal.cloudapp.net>
1 parent 526501b commit 162a1a2

File tree

16 files changed

+975
-77
lines changed

16 files changed

+975
-77
lines changed

CHANGES.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
* Bulk Transactions object Items iterator now returns the Item objects rather than the string IDs of the Item objects
1818
([#355](https://github.com/stac-utils/stac-fastapi/issues/355))
1919
* docker-compose now runs uvicorn with hot-reloading enabled
20+
* Bump version of PGStac to 0.6.2 that includes support for hydrating results in the API backed ([#397](https://github.com/stac-utils/stac-fastapi/pull/397))
2021

2122
### Removed
2223

@@ -27,7 +28,8 @@
2728
* Fixes issues (and adds tests) for issues caused by regression in pgstac ([#345](https://github.com/stac-utils/stac-fastapi/issues/345)
2829
* Update error response payloads to match the API spec. ([#361](https://github.com/stac-utils/stac-fastapi/pull/361))
2930
* Fixed stray `/` before the `#` in several extension conformance class strings ([383](https://github.com/stac-utils/stac-fastapi/pull/383))
30-
* SQLAlchemy backend bulk item insert now works ([#356]https://github.com/stac-utils/stac-fastapi/issues/356))
31+
* SQLAlchemy backend bulk item insert now works ([#356](https://github.com/stac-utils/stac-fastapi/issues/356))
32+
* PGStac Backend has stricter implementation of Fields Extension syntax ([#397](https://github.com/stac-utils/stac-fastapi/pull/397))
3133

3234
## [2.3.0]
3335

docker-compose.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ services:
5050
- GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR
5151
- DB_MIN_CONN_SIZE=1
5252
- DB_MAX_CONN_SIZE=1
53+
- USE_API_HYDRATE=${USE_API_HYDRATE:-false}
5354
ports:
5455
- "8082:8082"
5556
volumes:
@@ -62,7 +63,7 @@ services:
6263

6364
database:
6465
container_name: stac-db
65-
image: ghcr.io/stac-utils/pgstac:v0.4.5
66+
image: ghcr.io/stac-utils/pgstac:v0.6.2
6667
environment:
6768
- POSTGRES_USER=username
6869
- POSTGRES_PASSWORD=password

stac_fastapi/api/stac_fastapi/api/errors.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
DEFAULT_STATUS_CODES = {
2424
NotFoundError: status.HTTP_404_NOT_FOUND,
2525
ConflictError: status.HTTP_409_CONFLICT,
26-
ForeignKeyError: status.HTTP_422_UNPROCESSABLE_ENTITY,
26+
ForeignKeyError: status.HTTP_424_FAILED_DEPENDENCY,
2727
DatabaseError: status.HTTP_424_FAILED_DEPENDENCY,
2828
Exception: status.HTTP_500_INTERNAL_SERVER_ERROR,
2929
InvalidQueryParameter: status.HTTP_400_BAD_REQUEST,

stac_fastapi/pgstac/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,12 @@ pip install -e \
4646
stac_fastapi/pgstac[dev,server]
4747
```
4848

49+
### Settings
50+
51+
To configure PGStac stac-fastapi to [hydrate search result items in the API](https://github.com/stac-utils/pgstac#runtime-configurations), set the `USE_API_HYDRATE` environment variable to `true` or explicitly set the option in the PGStac Settings object.
52+
4953
### Migrations
54+
5055
PGStac is an external project and the may be used by multiple front ends.
5156
For Stac FastAPI development, a docker image (which is pulled as part of the docker-compose) is available at
5257
bitner/pgstac:[version] that has the full database already set up for PGStac.

stac_fastapi/pgstac/setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,17 @@
1717
"buildpg",
1818
"brotli_asgi",
1919
"pygeofilter @ git+https://github.com/geopython/[email protected]#egg=pygeofilter",
20+
"pypgstac==0.6.*",
2021
]
2122

2223
extra_reqs = {
2324
"dev": [
25+
"pypgstac[psycopg]==0.6.*",
2426
"pytest",
2527
"pytest-cov",
2628
"pytest-asyncio>=0.17",
2729
"pre-commit",
2830
"requests",
29-
"pypgstac==0.4.5",
3031
"httpx",
3132
],
3233
"docs": ["mkdocs", "mkdocs-material", "pdocs"],

stac_fastapi/pgstac/stac_fastapi/pgstac/config.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
"""Postgres API configuration."""
22

3+
from typing import Type
4+
5+
from stac_fastapi.pgstac.types.base_item_cache import (
6+
BaseItemCache,
7+
DefaultBaseItemCache,
8+
)
39
from stac_fastapi.types.config import ApiSettings
410

511

@@ -13,6 +19,7 @@ class Settings(ApiSettings):
1319
postgres_host_writer: hostname for the writer connection.
1420
postgres_port: database port.
1521
postgres_dbname: database name.
22+
use_api_hydrate: perform hydration of stac items within stac-fastapi.
1623
"""
1724

1825
postgres_user: str
@@ -27,6 +34,9 @@ class Settings(ApiSettings):
2734
db_max_queries: int = 50000
2835
db_max_inactive_conn_lifetime: float = 300
2936

37+
use_api_hydrate: bool = False
38+
base_item_cache: Type[BaseItemCache] = DefaultBaseItemCache
39+
3040
testing: bool = False
3141

3242
@property

stac_fastapi/pgstac/stac_fastapi/pgstac/core.py

Lines changed: 87 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,15 @@
1212
from pydantic import ValidationError
1313
from pygeofilter.backends.cql2_json import to_cql2
1414
from pygeofilter.parsers.cql2_text import parse as parse_cql2_text
15+
from pypgstac.hydration import hydrate
1516
from stac_pydantic.links import Relations
1617
from stac_pydantic.shared import MimeTypes
1718
from starlette.requests import Request
1819

20+
from stac_fastapi.pgstac.config import Settings
1921
from stac_fastapi.pgstac.models.links import CollectionLinks, ItemLinks, PagingLinks
2022
from stac_fastapi.pgstac.types.search import PgstacSearch
23+
from stac_fastapi.pgstac.utils import filter_fields
2124
from stac_fastapi.types.core import AsyncBaseCoreClient
2225
from stac_fastapi.types.errors import InvalidQueryParameter, NotFoundError
2326
from stac_fastapi.types.stac import Collection, Collections, Item, ItemCollection
@@ -103,8 +106,38 @@ async def get_collection(self, collection_id: str, **kwargs) -> Collection:
103106

104107
return Collection(**collection)
105108

109+
async def _get_base_item(
110+
self, collection_id: str, request: Request
111+
) -> Dict[str, Any]:
112+
"""Get the base item of a collection for use in rehydrating full item collection properties.
113+
114+
Args:
115+
collection: ID of the collection.
116+
117+
Returns:
118+
Item.
119+
"""
120+
item: Optional[Dict[str, Any]]
121+
122+
pool = request.app.state.readpool
123+
async with pool.acquire() as conn:
124+
q, p = render(
125+
"""
126+
SELECT * FROM collection_base_item(:collection_id::text);
127+
""",
128+
collection_id=collection_id,
129+
)
130+
item = await conn.fetchval(q, *p)
131+
132+
if item is None:
133+
raise NotFoundError(f"A base item for {collection_id} does not exist.")
134+
135+
return item
136+
106137
async def _search_base(
107-
self, search_request: PgstacSearch, **kwargs: Any
138+
self,
139+
search_request: PgstacSearch,
140+
**kwargs: Any,
108141
) -> ItemCollection:
109142
"""Cross catalog search (POST).
110143
@@ -119,9 +152,11 @@ async def _search_base(
119152
items: Dict[str, Any]
120153

121154
request: Request = kwargs["request"]
155+
settings: Settings = request.app.state.settings
122156
pool = request.app.state.readpool
123157

124-
# pool = kwargs["request"].app.state.readpool
158+
search_request.conf = search_request.conf or {}
159+
search_request.conf["nohydrate"] = settings.use_api_hydrate
125160
req = search_request.json(exclude_none=True, by_alias=True)
126161

127162
try:
@@ -141,30 +176,65 @@ async def _search_base(
141176
next: Optional[str] = items.pop("next", None)
142177
prev: Optional[str] = items.pop("prev", None)
143178
collection = ItemCollection(**items)
144-
cleaned_features: List[Item] = []
145179

146-
for feature in collection.get("features") or []:
147-
feature = Item(**feature)
180+
exclude = search_request.fields.exclude
181+
if exclude and len(exclude) == 0:
182+
exclude = None
183+
include = search_request.fields.include
184+
if include and len(include) == 0:
185+
include = None
186+
187+
async def _add_item_links(
188+
feature: Item,
189+
collection_id: Optional[str] = None,
190+
item_id: Optional[str] = None,
191+
) -> None:
192+
"""Add ItemLinks to the Item.
193+
194+
If the fields extension is excluding links, then don't add them.
195+
Also skip links if the item doesn't provide collection and item ids.
196+
"""
197+
collection_id = feature.get("collection") or collection_id
198+
item_id = feature.get("id") or item_id
199+
148200
if (
149201
search_request.fields.exclude is None
150202
or "links" not in search_request.fields.exclude
203+
and all([collection_id, item_id])
151204
):
152-
# TODO: feature.collection is not always included
153-
# This code fails if it's left outside of the fields expression
154-
# I've fields extension updated test cases to always include feature.collection
155205
feature["links"] = await ItemLinks(
156-
collection_id=feature["collection"],
157-
item_id=feature["id"],
206+
collection_id=collection_id,
207+
item_id=item_id,
158208
request=request,
159209
).get_links(extra_links=feature.get("links"))
160210

161-
exclude = search_request.fields.exclude
162-
if exclude and len(exclude) == 0:
163-
exclude = None
164-
include = search_request.fields.include
165-
if include and len(include) == 0:
166-
include = None
167-
cleaned_features.append(feature)
211+
cleaned_features: List[Item] = []
212+
213+
if settings.use_api_hydrate:
214+
215+
async def _get_base_item(collection_id: str) -> Dict[str, Any]:
216+
return await self._get_base_item(collection_id, request)
217+
218+
base_item_cache = settings.base_item_cache(
219+
fetch_base_item=_get_base_item, request=request
220+
)
221+
222+
for feature in collection.get("features") or []:
223+
base_item = await base_item_cache.get(feature.get("collection"))
224+
feature = hydrate(base_item, feature)
225+
226+
# Grab ids needed for links that may be removed by the fields extension.
227+
collection_id = feature.get("collection")
228+
item_id = feature.get("id")
229+
230+
feature = filter_fields(feature, include, exclude)
231+
await _add_item_links(feature, collection_id, item_id)
232+
233+
cleaned_features.append(feature)
234+
else:
235+
for feature in collection.get("features") or []:
236+
await _add_item_links(feature)
237+
cleaned_features.append(feature)
168238

169239
collection["features"] = cleaned_features
170240
collection["links"] = await PagingLinks(
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""base_item_cache classes for pgstac fastapi."""
2+
import abc
3+
from typing import Any, Callable, Coroutine, Dict
4+
5+
from starlette.requests import Request
6+
7+
8+
class BaseItemCache(abc.ABC):
9+
"""
10+
A cache that returns a base item for a collection.
11+
12+
If no base item is found in the cache, use the fetch_base_item function
13+
to fetch the base item from pgstac.
14+
"""
15+
16+
def __init__(
17+
self,
18+
fetch_base_item: Callable[[str], Coroutine[Any, Any, Dict[str, Any]]],
19+
request: Request,
20+
):
21+
"""
22+
Initialize the base item cache.
23+
24+
Args:
25+
fetch_base_item: A function that fetches the base item for a collection.
26+
request: The request object containing app state that may be used by caches.
27+
"""
28+
self._fetch_base_item = fetch_base_item
29+
self._request = request
30+
31+
@abc.abstractmethod
32+
async def get(self, collection_id: str) -> Dict[str, Any]:
33+
"""Return the base item for the collection and cache by collection id."""
34+
pass
35+
36+
37+
class DefaultBaseItemCache(BaseItemCache):
38+
"""Implementation of the BaseItemCache that holds base items in a dict."""
39+
40+
def __init__(
41+
self,
42+
fetch_base_item: Callable[[str], Coroutine[Any, Any, Dict[str, Any]]],
43+
request: Request,
44+
):
45+
"""Initialize the base item cache."""
46+
self._base_items = {}
47+
super().__init__(fetch_base_item, request)
48+
49+
async def get(self, collection_id: str):
50+
"""Return the base item for the collection and cache by collection id."""
51+
if collection_id not in self._base_items:
52+
self._base_items[collection_id] = await self._fetch_base_item(
53+
collection_id,
54+
)
55+
return self._base_items[collection_id]

0 commit comments

Comments
 (0)