-
Notifications
You must be signed in to change notification settings - Fork 289
Support Appends with TimeTransform Partitions #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Bumps [mkdocs-section-index](https://github.com/oprypin/mkdocs-section-index) from 0.3.8 to 0.3.9. - [Release notes](https://github.com/oprypin/mkdocs-section-index/releases) - [Commits](oprypin/mkdocs-section-index@v0.3.8...v0.3.9) --- updated-dependencies: - dependency-name: mkdocs-section-index dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [cython](https://github.com/cython/cython) from 3.0.8 to 3.0.10. - [Release notes](https://github.com/cython/cython/releases) - [Changelog](https://github.com/cython/cython/blob/master/CHANGES.rst) - [Commits](cython/cython@3.0.8...3.0.10) --- updated-dependencies: - dependency-name: cython dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Great! Thanks @syun64
def pyarrow_transform(self, source: IcebergType) -> "Callable[[pa.Array], pa.Array]": ... | ||
|
||
@property | ||
def supports_pyarrow_transform(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a public method, how about we maintaining a private list of transform that supports pyarrow transform for the check at the beginning of append
. For transforms that does not yet support pyarrow transform, we could throw NotImplementedError
in pyarrow_transform
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the existing convention for can_transform
and preserves_order
that are specified as class properties. I'm in favor of keeping it consistent with these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the context. My initial concern was that unlike can_transform
and preserve_oder
which behave differently across transforms, in the end the support_pyarrow_transform
would return True
for all the transforms, making it a little bit redundant. But on second thought, since we will likely not support pyarrow transform for other transforms in 0.7.0 release, this support_pyarrow_transform
can be a useful property in the upcoming release.
We could deprecate this property after we support pyarrow for all transforms later. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm on the same page! To add to that - I'm not yet sure if we will be able to support the other Transform classes in support_pyarrow_transform
. The key idea here is that we want to use native pyarrow compute functions to apply the equivalent transform without converting the values back and forth between Arrow and Python data types. I'm not yet sure if we'll be able to do the same for BucketTransform for instance, with the existing range of pyarrow compute functions.
Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3. - [Release notes](https://github.com/tqdm/tqdm/releases) - [Commits](tqdm/tqdm@v4.66.2...v4.66.3) --- updated-dependencies: - dependency-name: tqdm dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](pallets/werkzeug@3.0.1...3.0.3) --- updated-dependencies: - dependency-name: werkzeug dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.3...3.1.4) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
pyiceberg/transforms.py
Outdated
elif isinstance(source, TimestamptzType): | ||
epoch = datetime.EPOCH_TIMESTAMPTZ | ||
else: | ||
raise ValueError(f"Cannot apply month transform for type: {source}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cannot apply month transform for type" should be referring to 'hour', not 'month'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the check - appreciate it!
Bumps [tenacity](https://github.com/jd/tenacity) from 8.2.3 to 8.3.0. - [Release notes](https://github.com/jd/tenacity/releases) - [Commits](jd/tenacity@8.2.3...8.3.0) --- updated-dependencies: - dependency-name: tenacity dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mkdocstrings](https://github.com/mkdocstrings/mkdocstrings) from 0.25.0 to 0.25.1. - [Release notes](https://github.com/mkdocstrings/mkdocstrings/releases) - [Changelog](https://github.com/mkdocstrings/mkdocstrings/blob/main/CHANGELOG.md) - [Commits](mkdocstrings/mkdocstrings@0.25.0...0.25.1) --- updated-dependencies: - dependency-name: mkdocstrings dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.5.0 to 7.5.1. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.5.0...7.5.1) --- updated-dependencies: - dependency-name: coverage dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.29 to 2.0.30. - [Release notes](https://github.com/sqlalchemy/sqlalchemy/releases) - [Changelog](https://github.com/sqlalchemy/sqlalchemy/blob/main/CHANGES.rst) - [Commits](https://github.com/sqlalchemy/sqlalchemy/commits) --- updated-dependencies: - dependency-name: sqlalchemy dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [flask-cors](https://github.com/corydolphin/flask-cors) from 4.0.0 to 4.0.1. - [Release notes](https://github.com/corydolphin/flask-cors/releases) - [Changelog](https://github.com/corydolphin/flask-cors/blob/main/CHANGELOG.md) - [Commits](corydolphin/flask-cors@4.0.0...4.0.1) --- updated-dependencies: - dependency-name: flask-cors dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [coverage](https://github.com/nedbat/coveragepy) from 7.5.2 to 7.5.3. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.5.2...7.5.3) --- updated-dependencies: - dependency-name: coverage dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pydantic](https://github.com/pydantic/pydantic) from 2.7.1 to 2.7.2. - [Release notes](https://github.com/pydantic/pydantic/releases) - [Changelog](https://github.com/pydantic/pydantic/blob/main/HISTORY.md) - [Commits](pydantic/pydantic@v2.7.1...v2.7.2) --- updated-dependencies: - dependency-name: pydantic dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.3. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.2...v2.32.3) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.2.24 to 0.2.25. - [Release notes](https://github.com/Eventual-Inc/Daft/releases) - [Commits](Eventual-Inc/Daft@v0.2.24...v0.2.25) --- updated-dependencies: - dependency-name: getdaft dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
First step towards apache#579
Bumps [moto](https://github.com/getmoto/moto) from 5.0.8 to 5.0.9. - [Release notes](https://github.com/getmoto/moto/releases) - [Changelog](https://github.com/getmoto/moto/blob/master/CHANGELOG.md) - [Commits](getmoto/moto@5.0.8...5.0.9) --- updated-dependencies: - dependency-name: moto dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good @syun64 Thanks for working on this, could you rebase to resolve the merge conflicts?
* Update fsspec.py to respect s3.signer.uri property * Add S3_SIGNER_URI constant, add docs --------- Co-authored-by: Fokko Driesprong <[email protected]>
…4/iceberg-python into transform-partition-writes
@syun64 It looks like something is off. I see a lot of unrelated changes 🤔 |
@Fokko hmmm yeah I've seen this happen some times and I don't quite understand it - when you do a diff between the two branches, it shows the right changes: main...syun64:iceberg-python:transform-partition-writes |
I'll close this PR and reopen it and see if that helps |
Support writes with TimeTransforms.
TimeTransforms are can be supported natively within pyarrow as pyarrow compute functions, without requiring conversions back and forth between Arrow and Python data types.