Design: Iceberg support #5825

lynnro314 · 2023-05-07T15:27:29Z

close #5841
Iceberg support: branching & isolation design

itaiad200

Looks very promising! some questions, nothing blocking.

itaiad200 · 2023-05-07T15:40:25Z

design/open/iceberg-branching.md

+Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
+At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
+At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
+- The catalogs that will be supported are Glue, Snoflake and Tabular.


Suggested change

- The catalogs that will be supported are Glue, Snoflake and Tabular.

- The catalogs that will be supported are Glue, Snowflake and Tabular.

@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:

Suggested change

- The catalogs that will be supported are Glue, Snoflake and Tabular.

- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular.

itaiad200 · 2023-05-07T15:40:47Z

design/open/iceberg-branching.md

+Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
+At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
+At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
+- The catalogs that will be supported are Glue, Snoflake and Tabular.


Why these? Also - why not start with 1?

Yoni is investigation this option, we decided to start with the current implementation in order to fail fast, and we'll consider this at next phase.

itaiad200 · 2023-05-07T15:44:18Z

design/open/iceberg-branching.md

+
+Implement lakeFSIO:
+FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
+We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. 


How will the repo and branch be passed to the lakeFSIO?

itaiad200 · 2023-05-07T15:44:55Z

design/open/iceberg-branching.md

+## Usage
+
+Iceberg operations:
+Users will need to configure lakeFS catalog, with the relevant hadoop configuration.


Can you share how?

Suggested change

Users will need to configure lakeFS catalog, with the relevant hadoop configuration.

The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark:

`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`

The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.

itaiad200 · 2023-05-07T15:47:03Z

design/open/iceberg-branching.md

+- Can this catalog become part of the Iceberg source code?
+- Migration: how to use move my existing tables to lakeFS?
+- Iceberg diff: how to view table changes in lakeFS.
+- Merge: how to allow merging between branches?


Or meanwhile block merges that are going to make a table inconsistent.

I think probably conflicting merges will be blocked by our regular rules of merges :-/

arielshaqed

Thanks, very nice work!

Requesting more detail about:

What our FileIO will do.
Concurrency
Could you add a list of references that we can read to get more context of what all these Icebergish words mean?

Both at a high level of course, but I would prefer to specify our plans.

arielshaqed · 2023-05-08T06:11:50Z

design/open/iceberg-branching.md

+- Support merge operation.
+- Enable lakeFS operations using a dedicated SQL syntax.


Will these be possible somehow? Don't care about a detailed design, but do prefer solutions that are steps along the way to having these -- particularly merges!

Yes, enabling SQL syntax isn't related to own solution, and we do believe that our solution will fit to merge operation in the future.

Can you add a high-level explanation of how those might happen, please?

arielshaqed · 2023-05-08T06:12:40Z

design/open/iceberg-branching.md

+With lakeFS, this pointer needs to be versioned as well.
+In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer.
+Therefor, we'll extend hadoop catalog to work with lakeFS
+- Every operation of the catalog happens with the scope of a reference in lakeFS.


Can you explain how we know the scope? Is this by looking at the ref in the path?

arielshaqed · 2023-05-08T06:13:11Z

design/open/iceberg-branching.md

+- The catalog will know to use the metadata files from the relevant references
+
+Implement lakeFSIO:
+FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.


Can you add a link to FileIO, please?

arielshaqed · 2023-05-08T06:13:27Z

design/open/iceberg-branching.md

+- Write location on metadata files to be relative: the repo and brach will not be written.
+- The catalog will know to use the metadata files from the relevant references
+
+Implement lakeFSIO:


Suggestion:

Suggested change

Implement lakeFSIO:

Implement LakeFSFileIO:

arielshaqed · 2023-05-08T06:15:36Z

design/open/iceberg-branching.md

+## Alternatives Considered
+
+### Implementation of lakeFS FileIO alone
+In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.


Our documentation style guide requests gender-neutral phrasing:

Suggested change

In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.

In order to enable the user to choose their preferred catalog, we considered implementing only lakeFS FileIO.

arielshaqed · 2023-05-08T06:18:00Z

design/open/iceberg-branching.md

+Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
+Meaning, the user will need to specify the full path of the table, containing the repo and reference in lakeFS.


I am not sure if this says we add some limitation, or states that our FileIO will anyway see a full path and therefore everything will just work. Obviously I prefer the second :-)

It is the second one. :)
I'll add it

arielshaqed · 2023-05-08T06:24:24Z

design/open/iceberg-branching.md

+
+For example:
+
+`SELECT * FROM example-repo.main.table1;`


Thanks for the example!

Does Iceberg indeed support all allowed characters in repo names? For instance, PostgreSQL would require the tablename identifier to be doublequoted:
SELECT * FROM "example-repo.main.table1";
It is usually very precise about SQL rules. If Iceberg does not allow quoting names, this introduces a limitation to users about repository and branch names.

Where and how do I define table1? (Is it in lakeFS, per-ref?)

This will work for any commit ref, right? That would make me very happy.

I tried to run a query with quotes but it didn't work. I believe we'll be able to solve it when we'll face the issue.

With iceberg

yes
(added 2 and 3 to the design)

arielshaqed · 2023-05-08T06:25:50Z

design/open/iceberg-branching.md

+- Can this catalog become part of the Iceberg source code?
+- Migration: how to use move my existing tables to lakeFS?
+- Iceberg diff: how to view table changes in lakeFS.
+- Merge: how to allow merging between branches?


I think probably conflicting merges will be blocked by our regular rules of merges :-/

arielshaqed · 2023-05-08T06:49:49Z

design/open/iceberg-branching.md

+Implement lakeFSIO:
+FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
+We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. 
+


I know very little about Iceberg and FileIOs, only what I remember from about a year ago, so I am probably asking silly questions. But we might want to answer (some of) them in this design doc.

What is the behaviour of the lakeFSIO? Does it affect manifests, or something more? Does it change the path to the manifests?

To pick an example at random of something I don't understand: The manifests spec says data_file.file_path is the "full URI for the file with FS scheme". Is lakeFSIO responsible for returning it in this manner? If so, what gets stored in the manifest object itself and how does lakeFSIO process it? Does lakeFSIO understand Avro?

How do we support concurrency? This is the point of using Iceberg, "safely... at the same time" appears in the 2-line description at the top. The spec goes into atomic rename. Do we need to do that? lakeFS does support atomic rename by branching out, copying and deleting, and merging back. Does lakeFSIO need to do that, or will we be fine without it?

Or, Iceberg seems to want to do this to implement some isolation between reads and writes. Is the plan to use lakeFS directly for this concurrency?

I had a quick look and it seems that Iceberg may allow importing external files: for instance this might be an add_files operation. How does this work? What is the intended behaviour? If it is not supported, can we block it if we only implement a FileIO?

the fileIO (with sync to the catalog) is responsible to write and read the files, it will write the relative locations in the metadata files, and will have the scope of lakeFS repo and ref. Beside that it should work the same.

At first stage lakeFS won't support atomicity. We believe it's Ok since we are always at scope of a brach.
We'll require a single-writer per branch. It's a limitation but we think it's reasonable for now.

I think that for now it's out of scope. I added it to the open questions to keep that in mind.

Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:

Does Iceberg already use relative paths

Or, does our FileIO perform this translation...

when writing?

when reading?

About (2) thanks for adding text! In order of preference, we need to:

Enforce single-writer (e.g., all writers but the first fail)

Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)

"Just" document it.
But we need to do at least the last in the list.

About (3), again: can we block it, or if not can we report it, or do we "just" document it?

johnnyaug · 2023-05-08T07:25:39Z

design/open/iceberg-branching.md

+Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
+At first stage: support iceberg without being catalog agnostic - meaning work with only lakeFS catalog and not be compatible with other catalogs.
+At second stage: being catalog agnostic. Users will be able to configure both lakeFS catalog and other catalog together.
+- The catalogs that will be supported are Glue, Snoflake and Tabular.


@itaiad200, this is a product decision but it is open to discussion as subject to change as we proceed. So maybe:

Suggested change

- The catalogs that will be supported are Glue, Snoflake and Tabular.

- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular.

johnnyaug · 2023-05-08T07:26:39Z

design/open/iceberg-branching.md

+
+## Goals
+
+Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.


Suggested change

Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.

Enable working in isolation with an Iceberg table. Create a branch in lakeFS, and operate on the table on the branch.

johnnyaug · 2023-05-08T07:28:12Z

design/open/iceberg-branching.md

+## Proposed Design
+
+Implement lakeFS catalog:


Suggested change

## Proposed Design

Implement lakeFS catalog:

## Proposed Design: Dedicated lakeFS catalog

johnnyaug · 2023-05-08T07:28:50Z

design/open/iceberg-branching.md

+The catalog stores current metadata pointer for Iceberg tables.
+With lakeFS, this pointer needs to be versioned as well.
+In order to prevent from adding versioning capabilities to existing catalogs, we'll extend a catalog that can utilize lakeFS to keep the pointer.
+Therefor, we'll extend hadoop catalog to work with lakeFS


Suggested change

Therefor, we'll extend hadoop catalog to work with lakeFS

Therefore, we'll extend the Hadoop catalog to work with lakeFS

johnnyaug · 2023-05-08T07:30:41Z

design/open/iceberg-branching.md

+
+### Implementation of lakeFS FileIO alone
+In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
+This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers.


Suggested change

This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers.

As mentioned above, this will require the catalog to be able to version metadata pointers.

johnnyaug · 2023-05-08T07:34:55Z

design/open/iceberg-branching.md

+## Usage
+
+Iceberg operations:
+Users will need to configure lakeFS catalog, with the relevant hadoop configuration.


Suggested change

Users will need to configure lakeFS catalog, with the relevant hadoop configuration.

The user will need to set the catalog implementation to be the lakeFS catalog. For example, if working in Spark:

`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`

The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.

Jonathan-Rosenberg

It seems like a great way of implementing our Iceberg integration!
Thanks!

Jonathan-Rosenberg · 2023-05-09T08:51:52Z

design/open/iceberg-branching.md

+## Problem Description
+
+Iceberg stores metadata files that represent a snapshot of a given Iceberg table.
+Those metadata files save the locataion of the data hard-coded.


Suggested change

Those metadata files save the locataion of the data hard-coded.

Those metadata files save the location of the data hard-coded.

Jonathan-Rosenberg · 2023-05-09T08:58:36Z

design/open/iceberg-branching.md

+## Alternatives Considered
+
+### Implementation of lakeFS FileIO alone
+In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.


Suggested change

In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.

In order to enable the users to choose their preferred catalog, we considered implementing only lakeFS FileIO.

Jonathan-Rosenberg · 2023-05-09T09:06:43Z

design/open/iceberg-branching.md

+To start using Iceberg with lakeFS, the user will need to add the lakeFS catalog as a dependency.
+
+## Open Questions
+- Can this catalog become part of the Iceberg source code?


It would be extremely beneficial to us if we can do that, as most users find out about available catalogs through Iceberg's documentation page. #fail_fast.
(I would actually develop it in the context of the Iceberg repo and not ours)

Absolutely. I'm checking this :)

johnnyaug · 2023-05-10T08:08:01Z

design/open/iceberg-branching.md

+With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.
+For that, we'll require a single-writer per branch.
+


Suggested change

With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.

For that, we'll require a single-writer per branch.

This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API.

arielshaqed

Thanks!

I am not sure how our FileIO and Catalog work. In previous designs we have had some success by describing some flows. I believe that would be very beneficial here. For instance, it would really help me -- and possibly others -- if you could write up the sequence of steps during 3 sequences: SELECT ..., INSERT ..., UPDATE .... Each starts in Iceberg code, which then accesses our Catalog and our FileIO with various parameters, which end up being read from lakeFS or written to it. During the flow we could say what these parameters and how our components alter them.

arielshaqed · 2023-05-09T15:17:46Z

design/open/iceberg-branching.md

+- Support merge operation.
+- Enable lakeFS operations using a dedicated SQL syntax.


Can you add a high-level explanation of how those might happen, please?

arielshaqed · 2023-05-11T07:05:13Z

design/open/iceberg-branching.md

+As with the catalog, the reference in lakeFS will be extracted from the table path.
+
+This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. 
+We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API.


AFAIK that flag no longer works ever since KV. The only lakeFS synchronization primitive I know of is merging. @N-o-Z / @nopcoder amirite?

arielshaqed · 2023-05-11T07:06:22Z

design/open/iceberg-branching.md

+We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog.
+As with the catalog, the reference in lakeFS will be extracted from the table path.
+
+This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. 


We should document this. Can we enforce this?

But... how does Iceberg do it on S3?

arielshaqed · 2023-05-11T07:17:31Z

design/open/iceberg-branching.md

+Implement lakeFSIO:
+FileIO is the primary interface between the core Iceberg library and underlying storage. In practice, it is just two operations: open for read and open for write.
+We'll Extend hadoop fileIO to work with lakeFS, and to be compatible with lakeFS catalog. 
+


Please explain (1) in the design. I am still not sure how we propose to translate to relative paths:

Does Iceberg already use relative paths

Or, does our FileIO perform this translation...

when writing?

when reading?

About (2) thanks for adding text! In order of preference, we need to:

Enforce single-writer (e.g., all writers but the first fail)

Report multiple writers (e.g., all writers or all writers but the first report a multiple writers error, and the state is indeterminate)

"Just" document it.
But we need to do at least the last in the list.

About (3), again: can we block it, or if not can we report it, or do we "just" document it?

arielshaqed · 2023-05-11T07:18:06Z

design/open/iceberg-branching.md

+`spark.sql.catalog.lakefs.catalog-impl=io.lakefs.iceberg.LakeFSCatalog`
+The user will also need to configure a Hadoop FileSystem that can interact with objects on lakeFS, like the S3AFileSystem or LakeFSFileSystem.
+
+Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.


Suggested change

Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.

Every operation on the Iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.

itaiad200 · 2023-06-20T07:54:15Z

@lynnro314 can we merge this?

itaiad200 · 2023-06-22T09:55:21Z

@lynnro314 I think the design is ready :) Can we merge it?

lynnro314 · 2023-06-22T10:00:16Z

Sorry missed it, my bad.
Merging

Added basic iceberg design

8a742c0

lynnro314 added the exclude-changelog PR description should not be included in next release changelog label May 7, 2023

lynnro314 requested review from johnnyaug, arielshaqed, itaiad200 and Jonathan-Rosenberg May 7, 2023 15:27

itaiad200 requested a review from Isan-Rivkin May 7, 2023 15:35

itaiad200 approved these changes May 7, 2023

View reviewed changes

arielshaqed requested changes May 8, 2023

View reviewed changes

johnnyaug approved these changes May 8, 2023

View reviewed changes

arielshaqed mentioned this pull request May 8, 2023

How to use Iceberg to operate lakefs #3904

Closed

Jonathan-Rosenberg approved these changes May 9, 2023

View reviewed changes

lynnro314 requested a review from arielshaqed May 9, 2023 22:14

johnnyaug reviewed May 10, 2023

View reviewed changes

PR changes

be0895d

lynnro314 force-pushed the iceberg-design branch from 8ff6107 to be0895d Compare May 10, 2023 08:51

arielshaqed approved these changes May 11, 2023

View reviewed changes

itaiad200 assigned lynnro314 May 28, 2023

lynnro314 merged commit bcd20cb into master Jun 22, 2023

lynnro314 deleted the iceberg-design branch June 22, 2023 10:01

	- The catalogs that will be supported are Glue, Snoflake and Tabular.
	- The catalogs that will be supported are Glue, Snowflake and Tabular.

	- The catalogs that will be supported are Glue, Snoflake and Tabular.
	- We will start by supporting a limited set of catalogs, for example Glue, Snowflake and Tabular.

		- Support merge operation.
		- Enable lakeFS operations using a dedicated SQL syntax.

	In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
	In order to enable the user to choose their preferred catalog, we considered implementing only lakeFS FileIO.

		Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
		Meaning, the user will need to specify the full path of the table, containing the repo and reference in lakeFS.


		## Goals

		Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.

	Enable isolation with iceberg table. Create branch from existing one, and writing the data and metadata to the relevant branch.
	Enable working in isolation with an Iceberg table. Create a branch in lakeFS, and operate on the table on the branch.

	Therefor, we'll extend hadoop catalog to work with lakeFS
	Therefore, we'll extend the Hadoop catalog to work with lakeFS

	This attemp failed since, as mentioned above, it will require the catalog to be able to version metadata pointers.
	As mentioned above, this will require the catalog to be able to version metadata pointers.

	Those metadata files save the locataion of the data hard-coded.
	Those metadata files save the location of the data hard-coded.

	In order to enable the user to choose his preferred catalog, we considered implementing only lakeFS FileIO.
	In order to enable the users to choose their preferred catalog, we considered implementing only lakeFS FileIO.

		With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.
		For that, we'll require a single-writer per branch.

	With this design the catalog won't support atomicity. We believe it's Ok at first stage since we are always at scope of a branch.
	For that, we'll require a single-writer per branch.
	This design will require users to only have a single writer per branch. This is because the Hadoop catalog uses a rename operation which is not atomic in lakeFS. We think it's a reasonable limitation to start with. We may be able to overcome this limitation in the future, using the `IfNoneMatch` flag in the lakeFS API.

	Every operation on iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.
	Every operation on the Iceberg table (writing, reading, compaction, etc.) will be performed on the table in lakeFS.

Design: Iceberg support #5825

Design: Iceberg support #5825

Uh oh!

Conversation

lynnro314 commented May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itaiad200 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arielshaqed left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jonathan-Rosenberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lynnro314 commented May 7, 2023 •

edited

Loading

Jonathan-Rosenberg May 9, 2023 •

edited

Loading

johnnyaug May 10, 2023 •

edited

Loading