Skip to content

Conversation

c-thiel
Copy link
Contributor

@c-thiel c-thiel commented May 14, 2024

Fixes #740

@c-thiel
Copy link
Contributor Author

c-thiel commented May 19, 2024

@amogh-jahagirdar @HonahX @Fokko any thoughts about this PR? Would this be OK to incorporate?
It does not change any existing behavior, just respects an optional config option. spark honors this option - the constant is defined here:
https://iceberg.apache.org/javadoc/1.4.1/constant-values.html

@Fokko
Copy link
Contributor

Fokko commented May 23, 2024

@c-thiel Since it is in Java, I think it is fair to add it here. Two things:

  • Can you mention it in the docs as well?
  • Should we introduce a constant?

@c-thiel c-thiel force-pushed the fix/remote-signing-uri-property branch from cbad6a9 to 4b7df0d Compare May 23, 2024 14:34
@c-thiel
Copy link
Contributor Author

c-thiel commented May 23, 2024

@Fokko thanks a lot for your feedback - I added docs and the constant.
The constant is a very good idea - I hope we will be able to use remote signing with FileIO as well eventually. Right now only the fsspec impl. respects it.

I saw that tabular.io actually provides explicit S3 credentials (on top of remote signing), presumably via AWS STS, if "vended-credentials" are requested (https://github.com/apache/iceberg/blob/b3c25fb7608934d975a054b353823ca001ca3742/open-api/rest-catalog-open-api.yaml#L1495). This is something that can only ever work for AWS S3 and is noticeably slower than using remote signing. As remote signing works also with on-prem deployments, I really hope this is going to become the default for all clients and not vended-credentials. tabular does this only for pyiceberg. Spark requests remote-signing so there is no need to go the extra mile and generate S3 creds.

Right now unfortunately in pyiceberg, "vended-credentials" is hardcoded

session.headers["X-Iceberg-Access-Delegation"] = "vended-credentials"

, even though "remote-signing" is actually supported via fsspec. If the server decides to just ignore what the client requests and push remote signing anyway together with:

    "rest.sigv4-enabled": "true",
    "py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",

it works like a charm.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @c-thiel thanks for working on this 👍

@Fokko
Copy link
Contributor

Fokko commented May 31, 2024

@c-thiel The idea is to support both. The problem is that with Arrow it is hard to inject in the signing process, since most of the code is pushed down to the C level, and there are no hooks to use a custom signer. Support for remote signing is something that we definitely want to continue to support.

@Fokko Fokko merged commit 20f6afd into apache:main May 31, 2024
@guitcastro
Copy link
Contributor

@Fokko thanks a lot for your feedback - I added docs and the constant. The constant is a very good idea - I hope we will be able to use remote signing with FileIO as well eventually. Right now only the fsspec impl. respects it.

I saw that tabular.io actually provides explicit S3 credentials (on top of remote signing), presumably via AWS STS, if "vended-credentials" are requested (https://github.com/apache/iceberg/blob/b3c25fb7608934d975a054b353823ca001ca3742/open-api/rest-catalog-open-api.yaml#L1495). This is something that can only ever work for AWS S3 and is noticeably slower than using remote signing. As remote signing works also with on-prem deployments, I really hope this is going to become the default for all clients and not vended-credentials. tabular does this only for pyiceberg. Spark requests remote-signing so there is no need to go the extra mile and generate S3 creds.

Right now unfortunately in pyiceberg, "vended-credentials" is hardcoded

session.headers["X-Iceberg-Access-Delegation"] = "vended-credentials"

, even though "remote-signing" is actually supported via fsspec. If the server decides to just ignore what the client requests and push remote signing anyway together with:

    "rest.sigv4-enabled": "true",
    "py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO",

it works like a charm.

Unfortunately, for nessie, when using X-Iceberg-Access-Delegation: vended-credentials does not work. The endpoint does not return the s3.signer.uri. When the head is set to remote-signing it does return the correct value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iceberg Rest Catalog does not honor "s3.signer.uri" property
3 participants