Skip to content

[pyiceberg_core] Expose IcebergTableProvider to python #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kevinjqliu opened this issue Jan 2, 2025 · 3 comments · Fixed by #1324
Closed

[pyiceberg_core] Expose IcebergTableProvider to python #865

kevinjqliu opened this issue Jan 2, 2025 · 3 comments · Fixed by #1324
Assignees

Comments

@kevinjqliu
Copy link
Contributor

kevinjqliu commented Jan 2, 2025

Inspired by #650 and Delta Lake's datafusion integration

I want to expose IcebergTableProvider to Datafusion as python binding using Custom Table Provider

Integration with Python might look something like,

from pyiceberg_core import table_provider
from datafusion import SessionContext

ctx = SessionContext()
iceberg_table_provider = table_provider.create_table_provider(
    metadata_location=metadata_location
)
ctx.register_table_provider("test", iceberg_table_provider)
table = ctx.table("test")
table.show()
@kevinjqliu kevinjqliu self-assigned this Jan 2, 2025
@kevinjqliu
Copy link
Contributor Author

Possibly blocked by apache/datafusion#13851

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Jan 8, 2025

Got an example working by building the latest datafusion* libraries locally.
Requires apache/datafusion#13937 and new versions of datafusion* libraries with the PR. Possibly included in the next datafusion release, v45

@Xuanwo
Copy link
Member

Xuanwo commented Jan 8, 2025

That's nice! Thank you @kevinjqliu for pushing forward on this.

Xuanwo pushed a commit that referenced this issue May 15, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #865

## What changes are included in this PR?
This PR creates a new `IcebergDataFusionTable` python class and exposes
it through the new `pyiceberg_core.datafusion` module.
```
from pyiceberg_core.datafusion import IcebergDataFusionTable
```

The goal of exposing `IcebergDataFusionTable` is to be able to register
the Iceberg table provider to datafusion-python, using the
`register_table_provider` API.
See the usage example in
`bindings/python/tests/test_datafusion_table_provider.py`

The integration relies on the `FFI_TableProvider` API as described in
https://datafusion.apache.org/python/user-guide/io/table_provider.html

Note that this integration only works for `datafusion >= 45` due to this
issue apache/datafusion#13851


<!--
Provide a summary of the modifications in this PR. List the main changes
such as new features, bug fixes, refactoring, or any other updates.
-->

## Are these changes tested?

<!--
Specify what test covers (unit test, integration test, etc.).

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes, unit tests. 

To build and test locally:
```
cd bindings/python
hatch run dev:develop
hatch run dev:test
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants