Skip to content

Design: Iceberg support #5825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 22, 2023
Merged

Design: Iceberg support #5825

merged 2 commits into from
Jun 22, 2023

Conversation

lynnro314
Copy link
Contributor

@lynnro314 lynnro314 commented May 7, 2023

close #5841
Iceberg support: branching & isolation design

@lynnro314 lynnro314 added the exclude-changelog PR description should not be included in next release changelog label May 7, 2023
@itaiad200 itaiad200 requested a review from Isan-Rivkin May 7, 2023 15:35
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising! some questions, nothing blocking.

Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
- The catalogs that will be supported are Glue, Snoflake and Tabular.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The catalogs that will be supported are Glue, Snoflake and Tabular.
- The catalogs that will be supported are Glue, Snowflake and Tabular.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:

Suggested change
- The catalogs that will be supported are Glue, Snoflake and Tabular.
- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular.

Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
- The catalogs that will be supported are Glue, Snoflake and Tabular.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these? Also - why not start with 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yoni is investigation this option, we decided to start with the current implementation in order to fail fast, and we'll consider this at next phase.


Implement lakeFSIO:
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the repo and branch be passed to the lakeFSIO?

## Usage

Iceberg operations:
Users will need to configure lakeFS catalog, with the relevant hadoop configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share how?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users will need to configure lakeFS catalog, with the relevant hadoop configuration.
The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark:
`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.

- Can this catalog become part of the Iceberg source code?
- Migration: how to use move my existing tables to lakeFS?
- Iceberg diff: how to view table changes in lakeFS.
- Merge: how to allow merging between branches?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or meanwhile block merges that are going to make a table inconsistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably conflicting merges will be blocked by our regular rules of merges :-/

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, very nice work!

Requesting more detail about:

  • What our FileIO will do.
  • Concurrency
  • Could you add a list of references that we can read to get more context of what all these Icebergish words mean?

Both at a high level of course, but I would prefer to specify our plans.

Comment on lines +24 to +25
- Support merge operation.
- Enable lakeFS operations using a dedicated SQL syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these be possible somehow? Don't care about a detailed design, but do prefer solutions that are steps along the way to having these -- particularly merges!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, enabling SQL syntax isn't related to own solution, and we do believe that our solution will fit to merge operation in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a high-level explanation of how those might happen, please?

With lakeFS, this pointer needs to be versioned as well.
In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer.
Therefor, we'll extend hadoop catalog to work with lakeFS
- Every operation of the catalog happens with the scope of a reference in lakeFS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how we know the scope? Is this by looking at the ref in the path?

- The catalog will know to use the metadata files from the relevant references

Implement lakeFSIO:
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to FileIO, please?

- Write location on metadata files to be relative: the repo and brach will not be written.
- The catalog will know to use the metadata files from the relevant references

Implement lakeFSIO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

Suggested change
Implement lakeFSIO:
Implement LakeFSFileIO:

## Alternatives Considered

### Implementation of lakeFS FileIO alone
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentation style guide requests gender-neutral phrasing:

Suggested change
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
In order to enable the user to choose their preferred catalog, we considered implementing only lakeFS FileIO.

Comment on lines 57 to 58
Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
Meaning, the user will need to specify the full path of the table, containing the repo and reference in lakeFS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this says we add some limitation, or states that our FileIO will anyway see a full path and therefore everything will just work. Obviously I prefer the second :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the second one. :)
I'll add it


For example:

`SELECT * FROM example-repo.main.table1;`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example!

  1. Does Iceberg indeed support all allowed characters in repo names? For instance, PostgreSQL would require the tablename identifier to be doublequoted:
    SELECT * FROM "example-repo.main.table1";
    It is usually very precise about SQL rules. If Iceberg does not allow quoting names, this introduces a limitation to users about repository and branch names.
  2. Where and how do I define table1? (Is it in lakeFS, per-ref?)
  3. This will work for any commit ref, right? That would make me very happy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I tried to run a query with quotes but it didn't work. I believe we'll be able to solve it when we'll face the issue.
  2. With iceberg
  3. yes
    (added 2 and 3 to the design)

- Can this catalog become part of the Iceberg source code?
- Migration: how to use move my existing tables to lakeFS?
- Iceberg diff: how to view table changes in lakeFS.
- Merge: how to allow merging between branches?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably conflicting merges will be blocked by our regular rules of merges :-/

Implement lakeFSIO:
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know very little about Iceberg and FileIOs, only what I remember from about a year ago, so I am probably asking silly questions. But we might want to answer (some of) them in this design doc.

  1. What is the behaviour of the lakeFSIO? Does it affect manifests, or something more? Does it change the path to the manifests?

    To pick an example at random of something I don't understand: The manifests spec says data_file.file_path is the "full URI for the file with FS scheme". Is lakeFSIO responsible for returning it in this manner? If so, what gets stored in the manifest object itself and how does lakeFSIO process it? Does lakeFSIO understand Avro?

  2. How do we support concurrency? This is the point of using Iceberg, "safely... at the same time" appears in the 2-line description at the top. The spec goes into atomic rename. Do we need to do that? lakeFS does support atomic rename by branching out, copying and deleting, and merging back. Does lakeFSIO need to do that, or will we be fine without it?

    Or, Iceberg seems to want to do this to implement some isolation between reads and writes. Is the plan to use lakeFS directly for this concurrency?

  3. I had a quick look and it seems that Iceberg may allow importing external files: for instance this might be an add_files operation. How does this work? What is the intended behaviour? If it is not supported, can we block it if we only implement a FileIO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the fileIO (with sync to the catalog) is responsible to write and read the files, it will write the relative locations in the metadata files, and will have the scope of lakeFS repo and ref. Beside that it should work the same.

  2. At first stage lakeFS won't support atomicity. We believe it's Ok since we are always at scope of a brach.
    We'll require a single-writer per branch. It's a limitation but we think it's reasonable for now.

  3. I think that for now it's out of scope. I added it to the open questions to keep that in mind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:

  • Does Iceberg already use relative paths
  • Or, does our FileIO perform this translation...
    • when writing?
    • when reading?

About (2) thanks for adding text! In order of preference, we need to:

  • Enforce single-writer (e.g., all writers but the first fail)
  • Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)
  • "Just" document it.
    But we need to do at least the last in the list.

About (3), again: can we block it, or if not can we report it, or do we "just" document it?

Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
- The catalogs that will be supported are Glue, Snoflake and Tabular.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:

Suggested change
- The catalogs that will be supported are Glue, Snoflake and Tabular.
- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular.


## Goals

Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
Enable working in isolation with an Iceberg table. Create a branch in lakeFS, and operate on the table on the branch.

Comment on lines 28 to 30
## Proposed Design

Implement lakeFS catalog:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Proposed Design
Implement lakeFS catalog:
## Proposed Design: Dedicated lakeFS catalog

The catalog stores current metadata pointer for Iceberg tables.
With lakeFS, this pointer needs to be versioned as well.
In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer.
Therefor, we'll extend hadoop catalog to work with lakeFS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Therefor, we'll extend hadoop catalog to work with lakeFS
Therefore, we'll extend the Hadoop catalog to work with lakeFS


### Implementation of lakeFS FileIO alone
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers.
As mentioned above, this will require the catalog to be able to version metadata pointers.

## Usage

Iceberg operations:
Users will need to configure lakeFS catalog, with the relevant hadoop configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users will need to configure lakeFS catalog, with the relevant hadoop configuration.
The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark:
`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.

Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a great way of implementing our Iceberg integration!
Thanks!

## Problem Description

Iceberg stores metadata files that represent a snapshot of a given Iceberg table.
Those metadata files save the locataion of the data hard-coded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Those metadata files save the locataion of the data hard-coded.
Those metadata files save the location of the data hard-coded.

## Alternatives Considered

### Implementation of lakeFS FileIO alone
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
In order to enable the users to choose their preferred catalog, we considered implementing only lakeFS FileIO.

To start using Iceberg with lakeFS, the user will need to add the lakeFS catalog as a dependency.

## Open Questions
- Can this catalog become part of the Iceberg source code?
Copy link
Contributor

@Jonathan-Rosenberg Jonathan-Rosenberg May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be extremely beneficial to us if we can do that, as most users find out about available catalogs through Iceberg's documentation page. #fail_fast.
(I would actually develop it in the context of the Iceberg repo and not ours)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I'm checking this :)

@lynnro314 lynnro314 requested a review from arielshaqed May 9, 2023 22:14
Comment on lines 43 to 45
With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.
For that, we'll require a single-writer per branch.

Copy link
Contributor

@johnnyaug johnnyaug May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.
For that, we'll require a single-writer per branch.
This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I am not sure how our FileIO and Catalog work. In previous designs we have had some success by describing some flows. I believe that would be very beneficial here. For instance, it would really help me -- and possibly others -- if you could write up the sequence of steps during 3 sequences: SELECT ..., INSERT ..., UPDATE .... Each starts in Iceberg code, which then accesses our Catalog and our FileIO with various parameters, which end up being read from lakeFS or written to it. During the flow we could say what these parameters and how our components alter them.

Comment on lines +24 to +25
- Support merge operation.
- Enable lakeFS operations using a dedicated SQL syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a high-level explanation of how those might happen, please?

As with the catalog, the reference in lakeFS will be extracted from the table path.

This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS.
We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK that flag no longer works ever since KV. The only lakeFS synchronization primitive I know of is merging. @N-o-Z / @nopcoder amirite?

We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog.
As with the catalog, the reference in lakeFS will be extracted from the table path.

This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this. Can we enforce this?

But... how does Iceberg do it on S3?

Implement lakeFSIO:
FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:

  • Does Iceberg already use relative paths
  • Or, does our FileIO perform this translation...
    • when writing?
    • when reading?

About (2) thanks for adding text! In order of preference, we need to:

  • Enforce single-writer (e.g., all writers but the first fail)
  • Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)
  • "Just" document it.
    But we need to do at least the last in the list.

About (3), again: can we block it, or if not can we report it, or do we "just" document it?

`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`
The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.

Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
Every operation on the Iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.

@itaiad200
Copy link
Contributor

@lynnro314 can we merge this?

@itaiad200
Copy link
Contributor

@lynnro314 I think the design is ready :) Can we merge it?

@lynnro314
Copy link
Contributor Author

Sorry missed it, my bad.
Merging

@lynnro314 lynnro314 merged commit bcd20cb into master Jun 22, 2023
@lynnro314 lynnro314 deleted the iceberg-design branch June 22, 2023 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add design for Iceberg support
5 participants