Skip to content

[feat request] Make Table JSON serializable #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kevinjqliu opened this issue Mar 20, 2024 · 9 comments
Closed

[feat request] Make Table JSON serializable #535

kevinjqliu opened this issue Mar 20, 2024 · 9 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@kevinjqliu
Copy link
Contributor

Feature Request / Improvement

The REST Catalog exposes Table and TableMetadata information as HTTP endpoints in JSON format (link). This information is similar to the internal state of Table and TableMetadata objects in Python.

It would be great to make these JSON serializable.

Example

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
json.dumps(vars(tbl))

Error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Table is not JSON serializable
>>> json.dumps(vars(tbl))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TableMetadataV1 is not JSON serializable
@Fokko
Copy link
Contributor

Fokko commented Mar 20, 2024

We should be able to (de)serialize it using Pydantic. That's probably also faster.

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Mar 20, 2024

oh thanks for the hint, looks like using the model_dump_json function works.

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
tbl.metadata.model_dump_json()

but only on tbl.metadata and not tbl.

@kevinjqliu
Copy link
Contributor Author

There's already a __repr__ function defined for the Table object. @Fokko what do you think about adding another function for Table which will output the JSON representation?

@db-trin-life
Copy link

@kevinjqliu if no one is on this, can look to take this on

@kevinjqliu
Copy link
Contributor Author

@db-trin-life yep assigned to you!

@guptaakashdeep
Copy link
Contributor

@kevinjqliu @Fokko Is it still being worked on? I followed the conversation on this. I understand the issue that currently Table is not JSONSerializable but TableMetadata is because it extends IcebergBaseModel.

Do we want to make a Pydantic TableModel that extends IcebergBaseModel which will have same properties as Table class and then utilize to serialize and deserialize it ?

Sample Implementation:

# __init__.py

class CatalogModel(IcebergBaseModel):
    name: str
    properties: Dict[str, Any]

class TableModel(IcebergBaseModel):
    _identifier: Identifier
    metadata: TableMetadata
    metadata_location: str
    catalog: CatalogModel
    config: Dict[str, str]
    # Excluded IO for now -- as that class is not serializable, do we want to just keep the class Name here or make that serializable too?


class Table:
    """An Iceberg table."""

    _identifier: Identifier = Field()
    metadata: TableMetadata
    metadata_location: str = Field()
    io: FileIO
    catalog: Catalog
    config: Dict[str, str]

   .....
    def serialize(self):
        model = TableModel(
            _identifier = self._identifier,
            metadata = self.metadata,
            metadata_location = self.metadata_location,
            catalog = CatalogModel(
                name = self.catalog.name,
                properties = self.catalog.properties
            ),
            config = self.config
        )
        return model.model_dump_json()
 

something of this sort ?

@Fokko Fokko changed the title [feat request] Make Table / TableMetadata JSON serializable [feat request] Make Table JSON serializable Apr 18, 2025
@Fokko
Copy link
Contributor

Fokko commented Apr 18, 2025

TableMetadata can already be serialized to JSON.

Table is a bit more tricky since I don't think we want to serialize the whole Catalog and FileIO. I think the config contains everything we need to re-create the FileIO and Catalog, so that would be one option. @kevinjqliu did you have any specific use-case in mind?

@kevinjqliu
Copy link
Contributor Author

I think using model_dump_json is sufficient. I dont think i knew about model_dump_json when I wrote this up.

@kevinjqliu
Copy link
Contributor Author

Closing for now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants