Add Hive integration tests #207

Fokko · 2023-12-12T09:11:02Z

This makes it easier to test round trips of updating tables. Currently, we have integration tests for the REST catalog, but there the REST catalog takes care of updating the metadata.

This includes a refactor of moving all the integration tests into their own module.

By having this setup, it will also easier for others to pick up work, for example: #205 or #206.

cc @HonahX

HonahX

Thanks @Fokko! It is great to have a folder for integration tests. I have a few comments below

HonahX · 2023-12-14T07:19:41Z

tests/integration/__init__.py

+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,


I think we do not need __init__.py in test folders.

I would agree with you, but it raises an error:

_____________________________________________________________________________ ERROR collecting tests/integration/test_hive.py ______________________________________________________________________________ import file mismatch: imported module 'test_hive' has this __file__ attribute: /Users/fokkodriesprong/Desktop/iceberg-python/tests/catalog/test_hive.py which is not the same as the test file we want to collect: /Users/fokkodriesprong/Desktop/iceberg-python/tests/integration/test_hive.py HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules _____________________________________________________________________________ ERROR collecting tests/integration/test_rest.py ______________________________________________________________________________ import file mismatch: imported module 'test_rest' has this __file__ attribute: /Users/fokkodriesprong/Desktop/iceberg-python/tests/catalog/test_rest.py which is not the same as the test file we want to collect: /Users/fokkodriesprong/Desktop/iceberg-python/tests/integration/test_rest.py HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

We have the __init__.py files in the other folders as well 🤔

I raised this because we previously removed all __init__.py in tests: apache/iceberg#5919
I think we can rename these two files to avoid name collisions with those in catalogs/. This would enable us to safely delete the __init__.py

The other __init__.py in tests/ was added in #45. I removed this one and all tests still work.

It would be ideal to maintain consistency by not having __init__.py files in test folders. However, I believe this is a minor issue and should not block this PR.

HonahX · 2023-12-14T07:52:02Z

tests/integration/test_hive.py

+
+
+@pytest.fixture()
+def table_test_limit(catalog: Catalog) -> Table:


I wonder if we could use pytest_lazy_fixture to avoid duplicate tests in test_rest.py and test_hive.py
like what we did in

iceberg-python/tests/catalog/test_sql.py

Lines 113 to 120 in 2bd8cf2

@pytest.mark.parametrize(

'catalog',

[

lazy_fixture('catalog_memory'),

lazy_fixture('catalog_sqlite'),

],

)

def test_create_table_default_sort_order(catalog: SqlCatalog, table_schema_nested: Schema, random_identifier: Identifier) -> None:

.

Here it can be something like

@pytest.fixture(params=[ pytest.lazy_fixture('catalog_hive'), pytest.lazy_fixture('catalog_rest') ]) def catalog(request): return request.param @pytest.fixture() def table_test_limit(catalog: Catalog) -> Table:

and table_test_limit should load table from both catalogs in turn

I modified an example from https://pypi.org/project/pytest-lazy-fixture/

@pytest.fixture(params=[ pytest.lazy_fixture('one'), pytest.lazy_fixture('two') ]) def combined(request): return request.param @pytest.fixture def one(): return 1 @pytest.fixture def two(): return 2 @pytest.fixture def combined2(combined): return combined def test_func(combined2): assert combined2 in [1, 2]

which seems to work.

I will try to test this in our integration test later. If it works, I think it can be worth to implement. What do you think?

I added the lazy fixtures in #178 and I waited for the PR to be merged because I wanted to add them here as well. The reason I didn't do it is because test_upgrade_table_version and test_table_properties are not implemented in Hive (yet). This would mean that we need to keep them in parity which might slow down development.

Thanks for the context! For test_upgrade_table_version and test_table_properties, I think we might skip them in test_hive by

@pytest.fixture() def table(catalog: Catalog) -> Table: if catalog.name == "hive": pytest.skip("Not Implemented: ...") ... @pytest.fixture() def table_test_table_version(catalog: Catalog) -> Table: if catalog.name == "hive": pytest.skip("Not Implemented: ...") ...

(We might also find a way to put skip into the decorator)
Since the two fixtures are solely used by the two tests respectively.

I think we can discuss later on whether this can work in future tests related to update table. The current PR has already done many things and it is important for update_table tests. This topic should not block this PR

I actually like this suggestion a lot, let me add this to the PR

HonahX · 2023-12-14T07:57:08Z

dev/Dockerfile

@@ -62,7 +60,7 @@ RUN chmod u+x /opt/spark/sbin/* && \

 RUN pip3 install -q ipython

-RUN pip3 install "pyiceberg[s3fs]==${PYICEBERG_VERSION}"
+RUN pip3 install "pyiceberg[s3fs,pyarrow,hive]==${PYICEBERG_VERSION}"


Do we need pyarrow in the docker image? I think in provision.py we only need pyiceberg to create some empty table with UUID column. Since we use minio for storage, s3fs can help us write the metadata

Thanks! I got an error before, and therefore I added it, but I've removed it again and it seems to work fine 👍

HonahX

LGTM!

Fokko · 2024-01-17T15:30:38Z

Thanks @HonahX for the review 👍

Fokko added 5 commits December 12, 2023 09:48

Add Hive for CI

1852188

Merge branch 'main' of github.com:apache/iceberg-python into fd-hive

5e77a9f

Add Hive integration tests

6d3ba16

Add missing licenses

906323b

Fix

9ba99b1

Fokko added this to the PyIceberg 0.6.0 release milestone Dec 13, 2023

HonahX reviewed Dec 14, 2023

View reviewed changes

Remove Arrow

2e77cb7

HonahX approved these changes Dec 15, 2023

View reviewed changes

Fokko added 5 commits January 11, 2024 13:04

Merge branch 'main' of github.com:apache/iceberg-python into fd-hive

a25c1b7

Merge branch 'main' of github.com:apache/iceberg-python into fd-hive

6a27220

Add catalog

5ce1c8a

Update test suite

fc25fff

Whitespace

64f4732

Fokko merged commit 06e2b2d into apache:main Jan 17, 2024

Fokko deleted the fd-hive branch January 17, 2024 15:30

HonahX mentioned this pull request Feb 5, 2024

Partition Evolution #245

Merged

Fokko mentioned this pull request Mar 9, 2024

Add hive metastore catalog support (part 1/2) apache/iceberg-rust#237

Merged

Fokko mentioned this pull request Apr 24, 2024

Tracking issues of iceberg-rust v0.3.0 apache/iceberg-rust#348

Closed

73 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Hive integration tests #207

Add Hive integration tests #207

Uh oh!

Fokko commented Dec 12, 2023 •

edited

Loading

Uh oh!

HonahX left a comment

Uh oh!

HonahX Dec 14, 2023 •

edited

Loading

Uh oh!

Fokko Dec 14, 2023

Uh oh!

HonahX Dec 15, 2023 •

edited

Loading

Uh oh!

HonahX Dec 14, 2023 •

edited

Loading

Uh oh!

Fokko Dec 14, 2023

Uh oh!

HonahX Dec 15, 2023 •

edited

Loading

Uh oh!

Fokko Jan 17, 2024

Uh oh!

HonahX Dec 14, 2023

Uh oh!

Fokko Dec 14, 2023

Uh oh!

HonahX left a comment

Uh oh!

Fokko commented Jan 17, 2024

Uh oh!

Uh oh!



		@pytest.fixture()
		def table_test_limit(catalog: Catalog) -> Table:

	@pytest.mark.parametrize(
	'catalog',
	[
	lazy_fixture('catalog_memory'),
	lazy_fixture('catalog_sqlite'),
	],
	)
	def test_create_table_default_sort_order(catalog: SqlCatalog, table_schema_nested: Schema, random_identifier: Identifier) -> None:

Add Hive integration tests #207

Add Hive integration tests #207

Uh oh!

Conversation

Fokko commented Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HonahX left a comment

Choose a reason for hiding this comment

Uh oh!

HonahX Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

HonahX Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HonahX Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

HonahX Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fokko Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

HonahX Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

HonahX left a comment

Choose a reason for hiding this comment

Uh oh!

Fokko commented Jan 17, 2024

Uh oh!

Uh oh!

Fokko commented Dec 12, 2023 •

edited

Loading

HonahX Dec 14, 2023 •

edited

Loading

HonahX Dec 15, 2023 •

edited

Loading

HonahX Dec 14, 2023 •

edited

Loading

HonahX Dec 15, 2023 •

edited

Loading