Skip to content

Sorting/paging inconsistencies and errors #514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
floeschau opened this issue Jan 25, 2023 · 5 comments · Fixed by #535
Closed

Sorting/paging inconsistencies and errors #514

floeschau opened this issue Jan 25, 2023 · 5 comments · Fixed by #535
Assignees
Labels
bug Something isn't working pgstac
Milestone

Comments

@floeschau
Copy link

Hello there,

I found an issue with searches, when sorting is used in combination with paging, especially when the a field that is a sort criterion is missing in some items. The result is not complete and can even be different depending on the sort order (ascending or descending). Items in the search result can even be duplicated.

The script below reproduces the problem (check the endpoint variable to make sure it uses the correct URL). It creates 10 very simple STAC items of which some have eo:cloud_cover values (with duplicates), but some items don't have that field.

import requests
import datetime
from urllib.parse import urljoin


endpoint = "http://app-pgstac"
collection_name = "sort-test"


def create_test_collection():
    response = requests.delete(f"{endpoint}/collections/{collection_name}")
    body = {
        "type": "Collection",
        "id": collection_name,
        "stac_version": "1.0.0",
        "description": "Test collection to demonstrate sorting issues",
        "links": [
            { "rel": "root", "href": None, "type": "application/json" }
        ],
        "stac_extensions": [],
        "extent": {
            "spatial": { "bbox": [-180, -90, 180, 90] },
            "temporal": { "interval": ["2000-01-01T00:00:00Z", None] }
        }
    }
    requests.post(f"{endpoint}/collections", json=body)


def add_test_data():
    for i in range(10):
        cloud_cover = (i // 4) * 10
        now = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")

        item = {
            "stac_version": "1.0.0",
            "type": "Feature",
            "id": "item-{0:02}".format(i + 1),
            "geometry": {
                "type": "Point",
                "coordinates": [12, 41]
            },
            "properties": {
                "datetime": "2022-01-02T03:04:05.678Z",
                "platform": "myplatform",
            },
            "assets": {},
            "links": [],
            "collection": collection_name
        }
        if i % 3 != 0:
            item["properties"]["eo:cloud_cover"] = cloud_cover

        requests.post(f"{endpoint}/collections/{collection_name}/items", json=item)


def search(body: dict):
    url = f"{endpoint}/search"
    if "limit" not in body: body["limit"] = 2

    received_items = []
    page = 1

    while True:
        print(f"Page {page}: POST {endpoint}: {body}")
        response = requests.post(url, json=body)

        if response.status_code >= 300:
            print(f"ERROR RESPONSE: {response.text}")
            return None

        result = response.json()
        for item in result["features"]:
            print(f"- ID: {item['id']}")
            received_items.append(item)
        print("----")

        next_link = next((link for link in result["links"] if "rel" in link and link["rel"] == "next"), None)

        if next_link:
            page += 1
            url = next_link["href"]
            if "body" in next_link:
                body = next_link["body"]
        else:
            break

    print(f"Summary ({len(received_items)} items):")
    for item in received_items:
        print("- ID {0} (eo:cloud_cover = {1})".format(item["id"], item["properties"]["eo:cloud_cover"] if "eo:cloud_cover" in item["properties"] else "--"))

    return received_items


create_test_collection()
add_test_data()

for sort_dir in ["asc", "desc"]:
    print(f"\n======== Sort direction: {sort_dir} ========\n")
    body = {
        "collections": [collection_name],
        "filter-lang": "cql2-json",
        "sortby": [
            { "field": "properties.eo:cloud_cover", "direction": sort_dir }
        ]
    }
    search(body=body)

print(f"\n======== Sort direction: asc + desc ========\n")
body = {
    "collections": [collection_name],
    "filter-lang": "cql2-json",
    "sortby": [
        { "field": "properties.eo:cloud_cover", "direction": "asc" },
        { "field": "id", "direction": "desc" }
    ]
}
search(body=body)

When making a search request for the whole collection (without filter) ordering by eo:cloud_cover and with a limit of 2 items per page, all 10 items should be returned (in 5 pages), but there are only 6 items (ascending sort) and 4 items (descending sort), respectively.

Furthermore. when sorting by two criteria in opposed directions, the second page (with the next token) fails with an PostgreSQL syntax error.

This is the output:


======== Sort direction: asc ========

Page 1: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2}
- ID: item-03
- ID: item-02
----
Page 2: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2, 'token': 'next:item-02'}
- ID: item-03
- ID: item-08
----
Page 3: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2, 'token': 'next:item-08'}
- ID: item-09
- ID: item-10
----
Summary (6 items):
- ID item-03 (eo:cloud_cover = 0)
- ID item-02 (eo:cloud_cover = 0)
- ID item-03 (eo:cloud_cover = 0)
- ID item-08 (eo:cloud_cover = 10)
- ID item-09 (eo:cloud_cover = 20)
- ID item-10 (eo:cloud_cover = --)

======== Sort direction: desc ========

Page 1: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2}
- ID: item-10
- ID: item-07
----
Page 2: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-07'}
- ID: item-04
- ID: item-01
----
Page 3: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-01'}
----
Summary (4 items):
- ID item-10 (eo:cloud_cover = --)
- ID item-07 (eo:cloud_cover = --)
- ID item-04 (eo:cloud_cover = --)
- ID item-01 (eo:cloud_cover = --)

======== Sort direction: asc + desc ========

Page 1: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}, {'field': 'id', 'direction': 'desc'}], 'limit': 2}
- ID: item-03
- ID: item-02
----
Page 2: POST http://app-pgstac: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}, {'field': 'id', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-02'}
ERROR RESPONSE: {"code":"PostgresSyntaxError","description":"syntax error at or near \")\""}

Has anybody else noticed issues like these? They are more likely to be found in large heterogeneous collections that might not be typical, but I think these are problems that should be fixed.

Thank you for any feedback!

@floeschau
Copy link
Author

Minor issue in script, line 64 should read:

        print(f"Page {page}: POST {url}: {body}")

@gadomski gadomski self-assigned this Jan 25, 2023
@gadomski
Copy link
Member

What version of pgstac are you running? There was a release last week that included some fixes for sorting and paging (e.g. stac-utils/pgstac#154). If you're not on v0.6.13, can you update your pgstac and see if the issue persists? Thanks!

@floeschau
Copy link
Author

I was running 0.6.11. I upgraded to 0.6.13 and the result is slightly different. The syntax error for mixed ascending/descending sort no longer occurs, but when I run the test script again, the results are still inconsistent: 5 items (asc), 10 items (desc), and 6 items (asc + desc).

This is the output:


======== Sort direction: asc ========

Page 1: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2}
- ID: item-03
- ID: item-02
----
Page 2: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2, 'token': 'next:item-02'}
- ID: item-03
- ID: item-08
----
Page 3: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}], 'limit': 2, 'token': 'next:item-08'}
- ID: item-09
----
Summary (5 items):
- ID item-03 (eo:cloud_cover = 0)
- ID item-02 (eo:cloud_cover = 0)
- ID item-03 (eo:cloud_cover = 0)
- ID item-08 (eo:cloud_cover = 10)
- ID item-09 (eo:cloud_cover = 20)

======== Sort direction: desc ========

Page 1: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2}
- ID: item-10
- ID: item-07
----
Page 2: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-07'}
- ID: item-04
- ID: item-01
----
Page 3: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-01'}
- ID: item-09
- ID: item-08
----
Page 4: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-08'}
- ID: item-06
- ID: item-05
----
Page 5: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-05'}
- ID: item-03
- ID: item-02
----
Summary (10 items):
- ID item-10 (eo:cloud_cover = --)
- ID item-07 (eo:cloud_cover = --)
- ID item-04 (eo:cloud_cover = --)
- ID item-01 (eo:cloud_cover = --)
- ID item-09 (eo:cloud_cover = 20)
- ID item-08 (eo:cloud_cover = 10)
- ID item-06 (eo:cloud_cover = 10)
- ID item-05 (eo:cloud_cover = 10)
- ID item-03 (eo:cloud_cover = 0)
- ID item-02 (eo:cloud_cover = 0)

======== Sort direction: asc + desc ========

Page 1: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}, {'field': 'id', 'direction': 'desc'}], 'limit': 2}
- ID: item-03
- ID: item-02
----
Page 2: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}, {'field': 'id', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-02'}
- ID: item-08
- ID: item-06
----
Page 3: POST http://app-pgstac/search: {'collections': ['sort-test'], 'filter-lang': 'cql2-json', 'sortby': [{'field': 'properties.eo:cloud_cover', 'direction': 'asc'}, {'field': 'id', 'direction': 'desc'}], 'limit': 2, 'token': 'next:item-06'}
- ID: item-05
- ID: item-09
----
Summary (6 items):
- ID item-03 (eo:cloud_cover = 0)
- ID item-02 (eo:cloud_cover = 0)
- ID item-08 (eo:cloud_cover = 10)
- ID item-06 (eo:cloud_cover = 10)
- ID item-05 (eo:cloud_cover = 10)
- ID item-09 (eo:cloud_cover = 20)

@gadomski gadomski added the bug Something isn't working label Jan 26, 2023
@gadomski gadomski added this to the 2.4.4 milestone Jan 26, 2023
@floeschau
Copy link
Author

Another thing I noticed: it seems to be consistently the case (also with larger real-life data) that when sorting in descending order that the result is correct and complete (i.e. contains no duplicates and includes the items that don't have the property used as sort criterion), as in the second request above where all 10 items are returned).

And for the ascending sort, adding a second, unique sort criterion (e.g. id) makes sure that there are no longer duplicates and the result contains at least all items that have the property by which is sorted (6 with my sample data).

Thanks Pete for looking into this!

@gadomski
Copy link
Member

I've opened a PR with a failing test (based heavily on your example): #516. I'll work with the pgstac folks to see if we can't chase this down -- the error is probably over there.

@gadomski gadomski removed this from the 2.4.4 milestone Jan 31, 2023
gadomski added a commit that referenced this issue Mar 6, 2023
Confirming that #514 is fixed with pgstac v0.7.0.
@gadomski gadomski mentioned this issue Mar 6, 2023
4 tasks
@gadomski gadomski added this to the 2.4.4 milestone Mar 6, 2023
gadomski added a commit that referenced this issue Mar 6, 2023
Confirming that #514 is fixed with pgstac v0.7.0.
gadomski added a commit that referenced this issue Mar 9, 2023
* deps: bump pgstac to v0.7.1

* ci: only build packages on main or tags

* ci: don't log on services

* ci: up timeout minutes

* chore: update changelog

* test: add sorting test

Confirming that #514 is fixed with pgstac v0.7.0.
gadomski added a commit to stac-utils/stac-fastapi-pgstac that referenced this issue Mar 22, 2023
* deps: bump pgstac to v0.7.1

* ci: only build packages on main or tags

* ci: don't log on services

* ci: up timeout minutes

* chore: update changelog

* test: add sorting test

Confirming that stac-utils/stac-fastapi#514 is fixed with pgstac v0.7.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pgstac
Projects
None yet
2 participants