Skip to content

Validation Error in ConfigResponse Model When connecting Nessie with PyIceberg using RestCatalog #1524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
heman026 opened this issue Jan 16, 2025 · 13 comments

Comments

@heman026
Copy link

Question

I tried connecting to Nessie using load_catalog and RestCatalog() from pyiceberg, but I am getting the below error in Config Response Model,

pydantic_core._pydantic_core.ValidationError: 2 validation errors for ConfigResponse
defaults
Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.7/v/missing
overrides
Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.7/v/missing

Rest Catalog Code

`catalog = RestCatalog(
    "default",
    **{
        "uri": "http://10.xx.xx.xx:19120/api",
  "s3.endpoint": "http://10.xx.xx.xx.161:9000",
    "s3.access-key-id": "minioadmin",
    "s3.secret-access-key": "minioadmin",
        "warehouse": "s3a://iceberg-datalake",
    },

)`

load_catalog

catalog = load_catalog("rest",
    **{

        "uri": "http://10.xx.xx.xx:19120/api",
        "s3.path-style-access": "true",
  "s3.endpoint": "http://xx.xx.xx:9000",
    "s3.access-key-id": "minioadmin",
    "s3.secret-access-key": "minioadmin",
        "warehouse": "s3a://iceberg-datalake",

    },
    )

Please help me resolve this issue.

@Fokko
Copy link
Contributor

Fokko commented Jan 16, 2025

@heman026 Thanks for raising this, and happy to help. Do you have a full stack-trace?

@heman026
Copy link
Author

heman026 commented Jan 16, 2025

@heman026 Thanks for raising this, and happy to help. Do you have a full stack-trace?

Traceback (most recent call last):
  File "C:\duck.py", line 9, in <module>
    catalog = load_catalog("rest",
              ^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\XX\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyiceberg\catalog\__init__.py", line 248, in load_catalog
    return AVAILABLE_CATALOGS[catalog_type](name, cast(Dict[str, str], conf))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\XX\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyiceberg\catalog\__init__.py", line 123, in load_rest
    return RestCatalog(name, **conf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\XX\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyiceberg\catalog\rest.py", line 264, in __init__
    self._fetch_config()
  File "c:\Users\XX\AppData\Local\Programs\Python\Python312\Lib\site-packages\pyiceberg\catalog\rest.py", line 393, in _fetch_config
    config_response = ConfigResponse(**response.json())
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\XX\AppData\Local\Programs\Python\Python312\Lib\site-packages\pydantic\main.py", line 176, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for ConfigResponse
defaults
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing
overrides
  Field required [type=missing, input_value={'defaultBranch': 'main',...SupportedApiVersion': 2}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.7/v/missing

@Fokko
Copy link
Contributor

Fokko commented Jan 16, 2025

Thanks @heman026 for the quick reply. It looks like fields are missing from the config response. Could you share the JSON response? This can be done by adding a print statement in rest.py just before line 393.

This is how the response should look like: https://github.com/apache/iceberg/blob/d96901b843395fe669f6bd4f618f8e5e46c0eed4/open-api/rest-catalog-open-api.yaml#L1907-L1938

@heman026
Copy link
Author

heman026 commented Jan 16, 2025

DEBUG:urllib3.connectionpool:http://10.xx.xx.xx:19120 "GET /api/v1/config?warehouse=s3a%3A%2F%2Ficeberg-datalake HTTP/11" 200 84

{'defaultBranch': 'main', 'maxSupportedApiVersion': 2}

If I change the api version to v2 in rest.py, I am getting the following, with the same error
{'defaultBranch': 'main', 'minSupportedApiVersion': 1, 'maxSupportedApiVersion': 2, 'actualApiVersion': 2, 'specVersion': '2.2.0', 'noAncestorHash': '2e1cfa82b035c26cbbbdae632cea070514eb8b773f616aaeaf668e2f0be8f10d', 'repositoryCreationTimestamp': '2024-12-12T04:33:33.226326394Z', 'oldestPossibleCommitTimestamp': '2024-12-12T04:33:33.226326394Z'}

@Fokko
Copy link
Contributor

Fokko commented Jan 16, 2025

That doesn't look like the Iceberg REST protocol at all. I'm not an expert on Nessie, but maybe we can debug it together. What endpoint did you configure in PyIceberg?

@heman026
Copy link
Author

This is the pyiceberg config -

catalog = load_catalog("rest",
    **{

        "uri": "http://10.xx.xx.xx:19120/api",
        "s3.path-style-access": "true",
  "s3.endpoint": "http://xx.xx.xx:9000",
    "s3.access-key-id": "minioadmin",
    "s3.secret-access-key": "minioadmin",
        "warehouse": "s3a://iceberg-datalake",

    },
    )

@kevinjqliu
Copy link
Contributor

You can safely use your current Nessie applications, those that use type=nessie when using Iceberg, concurrently with applications using Nessie via Iceberg REST (type=rest with a URI like uri=http://127.0.0.1:19120/iceberg).

from https://projectnessie.org/guides/iceberg-rest/#seamless-migration-from-nessie-to-nessie-with-iceberg-rest

Looks like its an issue with the uri, might need to use /iceberg instead.

@heman026
Copy link
Author

You can safely use your current Nessie applications, those that use type=nessie when using Iceberg, concurrently with applications using Nessie via Iceberg REST (type=rest with a URI like uri=http://127.0.0.1:19120/iceberg).

from https://projectnessie.org/guides/iceberg-rest/#seamless-migration-from-nessie-to-nessie-with-iceberg-rest

Looks like its an issue with the uri, might need to use /iceberg instead.

Thanks Working

@Fokko
Copy link
Contributor

Fokko commented Jan 17, 2025

Thanks for confirming @heman026, closing this one

@HungYangChang
Copy link

Thanks @heman026 for sharing the solution

Would you mind sharing your configuration and how you load it?

I assume you are referring the tutorial similar to https://www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/

@adamcodes716
Copy link

You can safely use your current Nessie applications, those that use type=nessie when using Iceberg, concurrently with applications using Nessie via Iceberg REST (type=rest with a URI like uri=http://127.0.0.1:19120/iceberg).

from https://projectnessie.org/guides/iceberg-rest/#seamless-migration-from-nessie-to-nessie-with-iceberg-rest
Looks like its an issue with the uri, might need to use /iceberg instead.

Thanks Working

Care to share your working setup?

@heman026
Copy link
Author

heman026 commented Feb 26, 2025

Thanks @heman026 for sharing the solution

Would you mind sharing your configuration and how you load it?

I assume you are referring the tutorial similar to https://www.dremio.com/blog/intro-to-dremio-nessie-and-apache-iceberg-on-your-laptop/

Hi @adamcodes716, @HungYangChang

I have started Nessie with the following configuration:

./java \
-Dnessie.catalog.service.s3.default-options.request-signing-enabled=false \
-Dnessie.version.store.type=JDBC \
-Dquarkus.datasource.jdbc.url=jdbc:postgresql://localhost:5432/nessie_db \
-Dquarkus.datasource.username=nessie \
-Dquarkus.smallrye-health.context-propagation=true \
-Dquarkus.datasource.password=nessie \
-Dnessie.catalog.default-warehouse=warehouse \
-Dnessie.catalog.warehouses.warehouse.location=s3a://temp \
-Dnessie.catalog.service.s3.default-options.endpoint=http://xxxxxxx:9000 \
-Dnessie.catalog.service.s3.default-options.path-style-access=true \
-Dnessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:nessie-catalog-secrets.s3-access-key \
-Dnessie-catalog-secrets.s3-access-key.name=minioadmin \
-Dnessie-catalog-secrets.s3-access-key.secret=minioadmin \
-Dnessie.server.authentication.enabled=false \
-Dnessie.catalog.service.s3.default-options.region=us-east-1 \
/opt/nessie/nessie-quarkus-0.100.2-runner.jar

Configuration for REST catalog configuration

catalog = load_catalog("rest",
    **{

        "uri": "http://10.174.135.168:19120/iceberg",
  "s3.endpoint": "http://10.174.135.168:9000",
   "s3.remote-signing-enabled": "false",
  "s3.access-key-id":"minioadmin",
  "s3.secret-access-key":"minioadmin",
    },
    )

@FickleLife
Copy link

Hi @heman026 , I have tried your exact config above, but the Nessie endpoint at /iceberg is not valid, returning a 404. Does anyone have any other ideas on how to connect pyiceberg to Nessie?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants