Skip to content

Support Appends with TimeTransform Partitions #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 81 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
0cad231
checkpoint
sungwy May 4, 2024
96e5533
checkpoint2
sungwy May 5, 2024
ddfa9ac
todo: sort with pyarrow_transform vals
sungwy May 5, 2024
1a5327a
checkpoint
sungwy May 6, 2024
e067a28
checkpoint
sungwy May 6, 2024
069f3bd
fix
sungwy May 6, 2024
615d5e3
tests
sungwy May 6, 2024
c0a0f32
more tests
sungwy May 6, 2024
d872245
Remove trailing slash from table location when creating a table (#702)
felixscherz May 6, 2024
a1f4ba8
Build: Bump mkdocs-section-index from 0.3.8 to 0.3.9 (#696)
dependabot[bot] May 7, 2024
e2f547d
Build: Bump cython from 3.0.8 to 3.0.10 (#697)
dependabot[bot] May 7, 2024
29beaf8
Build: Bump tqdm from 4.66.2 to 4.66.3 (#699)
dependabot[bot] May 7, 2024
70a45f6
Build: Bump werkzeug from 3.0.1 to 3.0.3 (#706)
dependabot[bot] May 7, 2024
0eb0c1c
Build: Bump jinja2 from 3.1.3 to 3.1.4 in /mkdocs (#707)
dependabot[bot] May 7, 2024
6a39eda
adopt review feedback
sungwy May 7, 2024
990ce80
Make `add_files` to support `snapshot_properties` argument (#695)
enkidulan May 7, 2024
0508667
Add support for categorical type (#693)
sungwy May 7, 2024
1f39b59
Build: Bump tenacity from 8.2.3 to 8.3.0 (#714)
dependabot[bot] May 8, 2024
50a65e5
Build: Bump mkdocstrings from 0.25.0 to 0.25.1 (#715)
dependabot[bot] May 8, 2024
3461305
Build: Bump coverage from 7.5.0 to 7.5.1 (#713)
dependabot[bot] May 8, 2024
399a9be
Build: Bump sqlalchemy from 2.0.29 to 2.0.30 (#712)
dependabot[bot] May 8, 2024
6f72e30
Build: Bump flask-cors from 4.0.0 to 4.0.1 (#718)
dependabot[bot] May 8, 2024
d14e137
comment
sungwy May 8, 2024
4de207d
Build: Bump mkdocs-material from 9.5.20 to 9.5.21 (#719)
dependabot[bot] May 9, 2024
d02d7a1
Build: Bump getdaft from 0.2.23 to 0.2.24 (#721)
dependabot[bot] May 9, 2024
aa361d1
Test, write subset of schema (#704)
kevinjqliu May 9, 2024
b41c98c
Remove pylintrc file (#724)
ndrluis May 13, 2024
444dca7
Add kevinjqliu to collaborators (#729)
Fokko May 13, 2024
7904fe5
Build: Bump moto from 5.0.6 to 5.0.7 (#733)
dependabot[bot] May 14, 2024
0d98ec8
Build: Bump mkdocs-material from 9.5.21 to 9.5.22 (#732)
dependabot[bot] May 14, 2024
6c2ba34
Build: Bump griffe from 0.44.0 to 0.45.0 (#731)
dependabot[bot] May 14, 2024
20b7b53
Build: Bump pypa/cibuildwheel from 2.17.0 to 2.18.0 (#730)
dependabot[bot] May 14, 2024
6d52325
Hive catalog: Add retry logic for hive locking (#701)
frankliee May 15, 2024
a268e5b
Add create_namespace_if_not_exists method (#725)
ndrluis May 15, 2024
b40378b
Remove NoSuchNamespaceError on namespace creation (#726)
ndrluis May 15, 2024
ac84bd5
Build: Bump pyarrow from 16.0.0 to 16.1.0 (#743)
dependabot[bot] May 15, 2024
20c2731
Build: Bump mkdocstrings-python from 1.10.0 to 1.10.1 (#744)
dependabot[bot] May 15, 2024
4fddcbe
Build: Bump mkdocstrings-python from 1.10.1 to 1.10.2 (#746)
dependabot[bot] May 21, 2024
0a58636
Build: Bump boto3 from 1.34.69 to 1.34.106 (#749)
dependabot[bot] May 21, 2024
c764d6a
--- (#754)
dependabot[bot] May 21, 2024
245ab87
--- (#755)
dependabot[bot] May 21, 2024
82df57e
--- (#756)
dependabot[bot] May 21, 2024
aa5a136
[FEAT]register table using iceberg metadata file via pyiceberg (#711)
MehulBatra May 22, 2024
5537cb4
modify doc(backward compatibility) typo (#757)
SeungyeopShin May 23, 2024
e917660
Bump requests from 2.32.1 to 2.32.2 (#759)
dependabot[bot] May 23, 2024
7083b2e
Bump griffe from 0.45.0 to 0.45.1 (#760)
dependabot[bot] May 23, 2024
03a0d65
Bump mypy-boto3-glue from 1.34.88 to 1.34.110 (#761)
dependabot[bot] May 23, 2024
996afd0
Bump mkdocstrings-python from 1.10.2 to 1.10.3 (#762)
dependabot[bot] May 23, 2024
eba4bee
Initial implementation of the manifest table (#717)
geruh May 23, 2024
42afc43
Fix: Table-Exists if Server returns 204 (#739)
c-thiel May 23, 2024
959718a
Bump duckdb from 0.10.2 to 0.10.3 (#764)
dependabot[bot] May 25, 2024
ed83e84
Bump griffe from 0.45.1 to 0.45.2 (#765)
dependabot[bot] May 25, 2024
b8023d2
Bump typing-extensions from 4.11.0 to 4.12.0 (#767)
dependabot[bot] May 25, 2024
a132be1
Bump mkdocs-material from 9.5.24 to 9.5.25 (#770)
dependabot[bot] May 28, 2024
8968996
Add azure configuration variables (#745)
kevinzwang May 28, 2024
ee2a7c5
Bump moto from 5.0.7 to 5.0.8 (#771)
dependabot[bot] May 28, 2024
54aacb4
Bump coverage from 7.5.1 to 7.5.2 (#772)
dependabot[bot] May 28, 2024
756ae62
Introduce hierarchical namespaces into SqlCatalog (#591)
cccs-eric May 28, 2024
4fb8ba2
Bump coverage from 7.5.2 to 7.5.3 (#776)
dependabot[bot] May 29, 2024
ec8d7dc
Bump pydantic from 2.7.1 to 2.7.2 (#775)
dependabot[bot] May 29, 2024
7552e03
Bump requests from 2.32.2 to 2.32.3 (#778)
dependabot[bot] May 30, 2024
e08cc9d
Bump getdaft from 0.2.24 to 0.2.25 (#779)
dependabot[bot] May 30, 2024
d3ad61c
Remove `record_fields` from the `Record` class (#580)
Fokko May 30, 2024
cf3bf8a
Unify to double quotes using Ruff (#781)
HonahX May 30, 2024
91973f2
Bump moto from 5.0.8 to 5.0.9 (#783)
dependabot[bot] May 31, 2024
0339e7f
Support CreateTableTransaction for SqlCatalog (#684)
HonahX May 31, 2024
84a2c04
Support CreateTableTransaction for HiveCatalog (#683)
HonahX May 31, 2024
8d79664
Support viewfs scheme along side with hdfs (#777)
yothinix May 31, 2024
20f6afd
Update `fsspec.py`to respect `s3.signer.uri property` (#741)
c-thiel May 31, 2024
5dd846d
checkpoint
sungwy May 4, 2024
6357193
checkpoint2
sungwy May 5, 2024
c30a57c
todo: sort with pyarrow_transform vals
sungwy May 5, 2024
541655f
checkpoint
sungwy May 6, 2024
afe83b1
checkpoint
sungwy May 6, 2024
00ca5f0
fix
sungwy May 6, 2024
511e988
tests
sungwy May 6, 2024
3b784ab
more tests
sungwy May 6, 2024
3711b1b
adopt review feedback
sungwy May 7, 2024
f16d778
comment
sungwy May 8, 2024
80d5064
Merge branch 'transform-partition-writes' of https://github.com/syun6…
sungwy May 31, 2024
9f0a92b
rebase
sungwy May 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .asf.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ github:
collaborators: # Note: the number of collaborators is limited to 10
- ajantha-bhat
- syun64
- kevinjqliu
ghp_branch: gh-pages
ghp_path: /

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')

- name: Build wheels
uses: pypa/cibuildwheel@v2.17.0
uses: pypa/cibuildwheel@v2.18.1
with:
output-dir: wheelhouse
config-file: "pyproject.toml"
Expand Down
50 changes: 50 additions & 0 deletions mkdocs/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -606,6 +606,56 @@ min_snapshots_to_keep: [[null,10]]
max_snapshot_age_in_ms: [[null,604800000]]
```

### Manifests

To show a table's current file manifests:

```python
table.inspect.manifests()
```

```
pyarrow.Table
content: int8 not null
path: string not null
length: int64 not null
partition_spec_id: int32 not null
added_snapshot_id: int64 not null
added_data_files_count: int32 not null
existing_data_files_count: int32 not null
deleted_data_files_count: int32 not null
added_delete_files_count: int32 not null
existing_delete_files_count: int32 not null
deleted_delete_files_count: int32 not null
partition_summaries: list<item: struct<contains_null: bool not null, contains_nan: bool, lower_bound: string, upper_bound: string>> not null
child 0, item: struct<contains_null: bool not null, contains_nan: bool, lower_bound: string, upper_bound: string>
child 0, contains_null: bool not null
child 1, contains_nan: bool
child 2, lower_bound: string
child 3, upper_bound: string
----
content: [[0]]
path: [["s3://warehouse/default/table_metadata_manifests/metadata/3bf5b4c6-a7a4-4b43-a6ce-ca2b4887945a-m0.avro"]]
length: [[6886]]
partition_spec_id: [[0]]
added_snapshot_id: [[3815834705531553721]]
added_data_files_count: [[1]]
existing_data_files_count: [[0]]
deleted_data_files_count: [[0]]
added_delete_files_count: [[0]]
existing_delete_files_count: [[0]]
deleted_delete_files_count: [[0]]
partition_summaries: [[ -- is_valid: all not null
-- child 0 type: bool
[false]
-- child 1 type: bool
[false]
-- child 2 type: string
["test"]
-- child 3 type: string
["test"]]]
```

## Add Files

Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them.
Expand Down
3 changes: 2 additions & 1 deletion mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ For the FileIO there are several configuration options available:
| s3.access-key-id | admin | Configure the static secret access key used to access the FileIO. |
| s3.secret-access-key | password | Configure the static session token used to access the FileIO. |
| s3.signer | bearer | Configure the signature version of the FileIO. |
| s3.signer.uri | http://my.signer:8080/s3 | Configure the remote signing uri if it differs from the catalog uri. Remote signing is only implemented for `FsspecFileIO`. The final request is sent to `<s3.singer.uri>/v1/aws/s3/sign`. |
| s3.region | us-west-2 | Sets the region of the bucket |
| s3.proxy-uri | http://my.proxy.com:8080 | Configure the proxy server to be used by the FileIO. |
| s3.connect-timeout | 60.0 | Configure socket connection timeout, in seconds. |
Expand Down Expand Up @@ -298,4 +299,4 @@ PyIceberg uses multiple threads to parallelize operations. The number of workers

# Backward Compatibility

Previous versions of Java (`<1.4.0`) implementations incorrectly assume the optional attribute `current-snapshot-id` to be a required attribute in TableMetadata. This means that if `current-snapshot-id` is missing in the metadata file (e.g. on table creation), the application will throw an exception without being able to load the table. This assumption has been corrected in more recent Iceberg versions. However, it is possible to force PyIceberg to create a table with a metadata file that will be compatible with previous versions. This can be configured by setting the `legacy-current-snapshot-id` entry as "True" in the configuration file, or by setting the `LEGACY_CURRENT_SNAPSHOT_ID` environment variable. Refer to the [PR discussion](https://github.com/apache/iceberg-python/pull/473) for more details on the issue
Previous versions of Java (`<1.4.0`) implementations incorrectly assume the optional attribute `current-snapshot-id` to be a required attribute in TableMetadata. This means that if `current-snapshot-id` is missing in the metadata file (e.g. on table creation), the application will throw an exception without being able to load the table. This assumption has been corrected in more recent Iceberg versions. However, it is possible to force PyIceberg to create a table with a metadata file that will be compatible with previous versions. This can be configured by setting the `legacy-current-snapshot-id` entry as "True" in the configuration file, or by setting the `PYICEBERG_LEGACY_CURRENT_SNAPSHOT_ID` environment variable. Refer to the [PR discussion](https://github.com/apache/iceberg-python/pull/473) for more details on the issue
12 changes: 6 additions & 6 deletions mkdocs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@
# under the License.

mkdocs==1.6.0
griffe==0.44.0
jinja2==3.1.3
mkdocstrings==0.25.0
mkdocstrings-python==1.10.0
griffe==0.45.2
jinja2==3.1.4
mkdocstrings==0.25.1
mkdocstrings-python==1.10.3
mkdocs-literate-nav==0.6.1
mkdocs-autorefs==1.0.1
mkdocs-gen-files==0.5.0
mkdocs-material==9.5.20
mkdocs-material==9.5.25
mkdocs-material-extensions==1.3.1
mkdocs-section-index==0.3.8
mkdocs-section-index==0.3.9
Loading