-
Notifications
You must be signed in to change notification settings - Fork 378
Design: Iceberg support #5825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Iceberg support #5825
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very promising! some questions, nothing blocking.
design/open/iceberg-branching.md
Outdated
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch. | ||
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs. | ||
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together. | ||
- The catalogs that will be supported are Glue, Snoflake and Tabular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The catalogs that will be supported are Glue, Snoflake and Tabular. | |
- The catalogs that will be supported are Glue, Snowflake and Tabular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:
- The catalogs that will be supported are Glue, Snoflake and Tabular. | |
- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular. |
design/open/iceberg-branching.md
Outdated
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch. | ||
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs. | ||
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together. | ||
- The catalogs that will be supported are Glue, Snoflake and Tabular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these? Also - why not start with 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yoni is investigation this option, we decided to start with the current implementation in order to fail fast, and we'll consider this at next phase.
design/open/iceberg-branching.md
Outdated
|
||
Implement lakeFSIO: | ||
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write. | ||
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the repo and branch be passed to the lakeFSIO?
design/open/iceberg-branching.md
Outdated
## Usage | ||
|
||
Iceberg operations: | ||
Users will need to configure lakeFS catalog, with the relevant hadoop configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users will need to configure lakeFS catalog, with the relevant hadoop configuration. | |
The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark: | |
`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog` | |
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem. |
- Can this catalog become part of the Iceberg source code? | ||
- Migration: how to use move my existing tables to lakeFS? | ||
- Iceberg diff: how to view table changes in lakeFS. | ||
- Merge: how to allow merging between branches? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or meanwhile block merges that are going to make a table inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probably conflicting merges will be blocked by our regular rules of merges :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, very nice work!
Requesting more detail about:
- What our FileIO will do.
- Concurrency
- Could you add a list of references that we can read to get more context of what all these Icebergish words mean?
Both at a high level of course, but I would prefer to specify our plans.
- Support merge operation. | ||
- Enable lakeFS operations using a dedicated SQL syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these be possible somehow? Don't care about a detailed design, but do prefer solutions that are steps along the way to having these -- particularly merges!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, enabling SQL syntax isn't related to own solution, and we do believe that our solution will fit to merge operation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a high-level explanation of how those might happen, please?
design/open/iceberg-branching.md
Outdated
With lakeFS, this pointer needs to be versioned as well. | ||
In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer. | ||
Therefor, we'll extend hadoop catalog to work with lakeFS | ||
- Every operation of the catalog happens with the scope of a reference in lakeFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how we know the scope? Is this by looking at the ref in the path?
design/open/iceberg-branching.md
Outdated
- The catalog will know to use the metadata files from the relevant references | ||
|
||
Implement lakeFSIO: | ||
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a link to FileIO, please?
design/open/iceberg-branching.md
Outdated
- Write location on metadata files to be relative: the repo and brach will not be written. | ||
- The catalog will know to use the metadata files from the relevant references | ||
|
||
Implement lakeFSIO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
Implement lakeFSIO: | |
Implement LakeFSFileIO: |
design/open/iceberg-branching.md
Outdated
## Alternatives Considered | ||
|
||
### Implementation of lakeFS FileIO alone | ||
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our documentation style guide requests gender-neutral phrasing:
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO. | |
In order to enable the user to choose their preferred catalog, we considered implementing only lakeFS FileIO. |
design/open/iceberg-branching.md
Outdated
Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS. | ||
Meaning, the user will need to specify the full path of the table, containing the repo and reference in lakeFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this says we add some limitation, or states that our FileIO will anyway see a full path and therefore everything will just work. Obviously I prefer the second :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the second one. :)
I'll add it
|
||
For example: | ||
|
||
`SELECT * FROM example-repo.main.table1;` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example!
- Does Iceberg indeed support all allowed characters in repo names? For instance, PostgreSQL would require the tablename identifier to be doublequoted:
It is usually very precise about SQL rules. If Iceberg does not allow quoting names, this introduces a limitation to users about repository and branch names.
SELECT * FROM "example-repo.main.table1";
- Where and how do I define
table1
? (Is it in lakeFS, per-ref?) - This will work for any commit ref, right? That would make me very happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I tried to run a query with quotes but it didn't work. I believe we'll be able to solve it when we'll face the issue.
- With iceberg
- yes
(added 2 and 3 to the design)
- Can this catalog become part of the Iceberg source code? | ||
- Migration: how to use move my existing tables to lakeFS? | ||
- Iceberg diff: how to view table changes in lakeFS. | ||
- Merge: how to allow merging between branches? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probably conflicting merges will be blocked by our regular rules of merges :-/
Implement lakeFSIO: | ||
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write. | ||
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know very little about Iceberg and FileIOs, only what I remember from about a year ago, so I am probably asking silly questions. But we might want to answer (some of) them in this design doc.
-
What is the behaviour of the lakeFSIO? Does it affect manifests, or something more? Does it change the path to the manifests?
To pick an example at random of something I don't understand: The manifests spec says
data_file.file_path
is the "full URI for the file with FS scheme". Is lakeFSIO responsible for returning it in this manner? If so, what gets stored in the manifest object itself and how does lakeFSIO process it? Does lakeFSIO understand Avro? -
How do we support concurrency? This is the point of using Iceberg, "safely... at the same time" appears in the 2-line description at the top. The spec goes into atomic rename. Do we need to do that? lakeFS does support atomic rename by branching out, copying and deleting, and merging back. Does lakeFSIO need to do that, or will we be fine without it?
Or, Iceberg seems to want to do this to implement some isolation between reads and writes. Is the plan to use lakeFS directly for this concurrency?
-
I had a quick look and it seems that Iceberg may allow importing external files: for instance this might be an
add_files
operation. How does this work? What is the intended behaviour? If it is not supported, can we block it if we only implement a FileIO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
the fileIO (with sync to the catalog) is responsible to write and read the files, it will write the relative locations in the metadata files, and will have the scope of lakeFS repo and ref. Beside that it should work the same.
-
At first stage lakeFS won't support atomicity. We believe it's Ok since we are always at scope of a brach.
We'll require a single-writer per branch. It's a limitation but we think it's reasonable for now. -
I think that for now it's out of scope. I added it to the open questions to keep that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:
- Does Iceberg already use relative paths
- Or, does our FileIO perform this translation...
- when writing?
- when reading?
About (2) thanks for adding text! In order of preference, we need to:
- Enforce single-writer (e.g., all writers but the first fail)
- Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)
- "Just" document it.
But we need to do at least the last in the list.
About (3), again: can we block it, or if not can we report it, or do we "just" document it?
design/open/iceberg-branching.md
Outdated
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch. | ||
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs. | ||
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together. | ||
- The catalogs that will be supported are Glue, Snoflake and Tabular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:
- The catalogs that will be supported are Glue, Snoflake and Tabular. | |
- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular. |
design/open/iceberg-branching.md
Outdated
|
||
## Goals | ||
|
||
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch. | |
Enable working in isolation with an Iceberg table. Create a branch in lakeFS, and operate on the table on the branch. |
design/open/iceberg-branching.md
Outdated
## Proposed Design | ||
|
||
Implement lakeFS catalog: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Proposed Design | |
Implement lakeFS catalog: | |
## Proposed Design: Dedicated lakeFS catalog |
design/open/iceberg-branching.md
Outdated
The catalog stores current metadata pointer for Iceberg tables. | ||
With lakeFS, this pointer needs to be versioned as well. | ||
In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer. | ||
Therefor, we'll extend hadoop catalog to work with lakeFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefor, we'll extend hadoop catalog to work with lakeFS | |
Therefore, we'll extend the Hadoop catalog to work with lakeFS |
design/open/iceberg-branching.md
Outdated
|
||
### Implementation of lakeFS FileIO alone | ||
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO. | ||
This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers. | |
As mentioned above, this will require the catalog to be able to version metadata pointers. |
design/open/iceberg-branching.md
Outdated
## Usage | ||
|
||
Iceberg operations: | ||
Users will need to configure lakeFS catalog, with the relevant hadoop configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users will need to configure lakeFS catalog, with the relevant hadoop configuration. | |
The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark: | |
`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog` | |
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a great way of implementing our Iceberg integration!
Thanks!
design/open/iceberg-branching.md
Outdated
## Problem Description | ||
|
||
Iceberg stores metadata files that represent a snapshot of a given Iceberg table. | ||
Those metadata files save the locataion of the data hard-coded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those metadata files save the locataion of the data hard-coded. | |
Those metadata files save the location of the data hard-coded. |
design/open/iceberg-branching.md
Outdated
## Alternatives Considered | ||
|
||
### Implementation of lakeFS FileIO alone | ||
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO. | |
In order to enable the users to choose their preferred catalog, we considered implementing only lakeFS FileIO. |
To start using Iceberg with lakeFS, the user will need to add the lakeFS catalog as a dependency. | ||
|
||
## Open Questions | ||
- Can this catalog become part of the Iceberg source code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be extremely beneficial to us if we can do that, as most users find out about available catalogs through Iceberg's documentation page. #fail_fast
.
(I would actually develop it in the context of the Iceberg repo and not ours)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I'm checking this :)
design/open/iceberg-branching.md
Outdated
With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch. | ||
For that, we'll require a single-writer per branch. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch. | |
For that, we'll require a single-writer per branch. | |
This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I am not sure how our FileIO and Catalog work. In previous designs we have had some success by describing some flows. I believe that would be very beneficial here. For instance, it would really help me -- and possibly others -- if you could write up the sequence of steps during 3 sequences: SELECT ...
, INSERT ...
, UPDATE ...
. Each starts in Iceberg code, which then accesses our Catalog and our FileIO with various parameters, which end up being read from lakeFS or written to it. During the flow we could say what these parameters and how our components alter them.
- Support merge operation. | ||
- Enable lakeFS operations using a dedicated SQL syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a high-level explanation of how those might happen, please?
As with the catalog, the reference in lakeFS will be extracted from the table path. | ||
|
||
This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. | ||
We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. | ||
As with the catalog, the reference in lakeFS will be extracted from the table path. | ||
|
||
This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document this. Can we enforce this?
But... how does Iceberg do it on S3?
Implement lakeFSIO: | ||
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write. | ||
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:
- Does Iceberg already use relative paths
- Or, does our FileIO perform this translation...
- when writing?
- when reading?
About (2) thanks for adding text! In order of preference, we need to:
- Enforce single-writer (e.g., all writers but the first fail)
- Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)
- "Just" document it.
But we need to do at least the last in the list.
About (3), again: can we block it, or if not can we report it, or do we "just" document it?
`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog` | ||
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem. | ||
|
||
Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS. | |
Every operation on the Iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS. |
@lynnro314 can we merge this? |
@lynnro314 I think the design is ready :) Can we merge it? |
Sorry missed it, my bad. |
close #5841
Iceberg support: branching & isolation design