Skip to content

Schould tests in this repo run on a cron job? #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rgommers opened this issue Sep 13, 2022 · 6 comments · Fixed by #18
Closed

Schould tests in this repo run on a cron job? #12

rgommers opened this issue Sep 13, 2022 · 6 comments · Fixed by #18

Comments

@rgommers
Copy link
Member

It would probably be nice to have these tests run once a week, in case a new release of a dataframe library happens. Not urgent, but perhaps useful. It should be a matter of adding something like this to the existing .github/workflows/test.yml:

on:
  schedule:
  #        ┌───────────── minute (0 - 59)
  #        │  ┌───────────── hour (0 - 23)
  #        │  │ ┌───────────── day of the month (1 - 31)
  #        │  │ │ ┌───────────── month (1 - 12 or JAN-DEC)
  #        │  │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
  #        │  │ │ │ │
  - cron: "9  9 * * 6"
@honno
Copy link
Member

honno commented Sep 13, 2022

Yep I can have a go at some point, after I merge #11*.

* was going to wait on vaex but actually now that we have up-to-date implementations in pandas and modin (+cudf, but can't run that on CI), I can get #11 merged now after xfail tweaks.

@jorisvandenbossche
Copy link
Member

This would be good to have (also to allow you to see recent test logs)

@jorisvandenbossche
Copy link
Member

I find it also suspicious that the tests actually passed in September. Anything converting a dataframe with column of strings to pandas (eg vaex to pandas) should have failed until recently (pandas-dev/pandas#50565). And it seems that string column should normally be covered in the roundtrip tests.

@honno
Copy link
Member

honno commented Jan 10, 2023

This would be good to have (also to allow you to see recent test logs)

My biggest de-motivator here was that these nightly pandas builds suddenly weren't actually nightly at the end of last year. I spent some time exploring this to no avail, will have another go though.

Just didn't want these long build times affecting normal CI, so maybe I can leave a cron job to use a different workflow which builds everything, given vaex doesn't have nightly builds anyway.

Anything converting a dataframe with column of strings to pandas (eg vaex to pandas) should have failed until recently (pandas-dev/pandas#50565). And it seems that string column should normally be covered in the roundtrip tests.

There are quite a few xfails (and skips when flaky), including vaex-to-pandas. I could probably highlight this in the README. Do need to update this now.

ci_xfail_ids = [
# https://github.com/vaexio/vaex/pull/2150
"tests/test_signatures.py::test_column_method[vaex-size]",
# https://github.com/rapidsai/cudf/issues/11320
"test_signatures.py::test_buffer_method[cudf-__dlpack__]",
"test_signatures.py::test_buffer_method[cudf-__dlpack_device__]",
# https://github.com/vaexio/vaex/issues/2083
# https://github.com/vaexio/vaex/issues/2093
# https://github.com/vaexio/vaex/issues/2113
# https://github.com/vaexio/vaex/pull/2150
"test_from_dataframe.py::test_from_dataframe_roundtrip[pandas-vaex]",
"test_from_dataframe.py::test_from_dataframe_roundtrip[vaex-modin]",
"test_from_dataframe.py::test_from_dataframe_roundtrip[modin-vaex]",
"test_from_dataframe.py::test_from_dataframe_roundtrip[vaex-pandas]",
# https://github.com/vaexio/vaex/pull/2150
"test_column_object.py::test_size[vaex]",
# https://github.com/rapidsai/cudf/issues/11389
"test_column_object.py::test_dtype[cudf]",
# https://github.com/vaexio/vaex/pull/2150
"test_column_object.py::test_describe_categorical_on_categorical[vaex]",
# Raises RuntimeError, which is technically correct, but the spec will
# require TypeError soon.
# See https://github.com/data-apis/dataframe-api/pull/74
"test_column_object.py::test_describe_categorical[modin]",
# https://github.com/vaexio/vaex/issues/2113
"test_column_object.py::test_describe_categorical[vaex]",
# https://github.com/modin-project/modin/issues/4687
"test_column_object.py::test_null_count[modin]",
# https://github.com/vaexio/vaex/issues/2121
"test_column_object.py::test_get_chunks[vaex]",
]
ci_skip_ids = [
# https://github.com/rapidsai/cudf/issues/11332
"test_column_object.py::test_describe_categorical[cudf]",
# https://github.com/vaexio/vaex/issues/2118
# https://github.com/vaexio/vaex/issues/2139
"test_column_object.py::test_dtype[vaex]",
]

Also generally some dtypes are disabled as they're known to be erroneous in wrrapers.py, e.g.
datatimes and strings were disabled for cuDF

supported_dtypes=set(NominalDtype)
^ {
NominalDtype.DATETIME64NS,
# https://github.com/rapidsai/cudf/issues/11308
NominalDtype.UTF8,
},

@jorisvandenbossche
Copy link
Member

There are quite a few xfails (and skips when flaky), including vaex-to-pandas.

Hmm, so basically since there is only a single roundtrip test, the full roundtrip tests are skipped for many combinations. I suppose ideally we would only skip certain data types in the roundtrip, given the exact orig/dest library, but still run the test itself?
I suppose that is already the idea of supported_dtypes, and ideally the vaex tests would be unskipped but maybe their supported_dtypes limited to get it passing?

@honno
Copy link
Member

honno commented Jan 10, 2023

There are quite a few xfails (and skips when flaky), including vaex-to-pandas.

I suppose ideally we would only skip certain data types in the roundtrip, given the exact orig/dest library, but still run the test itself? I suppose that is already the idea of supported_dtypes

Yep!

and ideally the vaex tests would be unskipped but maybe their supported_dtypes limited to get it passing?

vaex has a few bugs which prevents tests (inc. the roundtrips) from passing beyond just a lack of dtype support. I think that goes the same for the other adopters, but yeah definitely need to revisit this + add a cron job.

@honno honno mentioned this issue Feb 28, 2023
@honno honno closed this as completed in #18 Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants