Skip to content

DSGN: collect tests metainfo during execution #5640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BeforeFlight opened this issue Jul 22, 2019 · 3 comments
Open

DSGN: collect tests metainfo during execution #5640

BeforeFlight opened this issue Jul 22, 2019 · 3 comments
Labels
topic: parametrize related to @pytest.mark.parametrize type: performance performance or memory problem/improvement type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature

Comments

@BeforeFlight
Copy link

BeforeFlight commented Jul 22, 2019

As stated here collection of all metadata happens before tests are actually executed. This leads to performance issues that makes pytest parametrization hard to use for at least following 2 reasons:

  1. Iterators (again) exactly as in already mentioned issue and in others (not only on current tracker) iterators are needed to iterate lazily otherwise pytest is not helper here.

Consider example: we want to parametrize test over array with say 10 parameters. Pytest easily deal with this creating all possible permutations from 10. But if now we need all permutations for 2 levels - we need 10 times more tests and 10 more RAM? What if we need more (in fact it is not just intentionally created test - its from real life problem; if we test function with 2-3 nested conditions and maybe nested 2-3 function calls inside we already need about 4-9 level in order to hit each situation possible). In fact even 4-5 levels (i.e. maximum is 4-5 level for nested condition possible) with 20GB RAM already not feasible (but this info from my concrete case of course).

  1. Even if we managed to fit in RAM before our tests actually running we need to wait for metadata being collected. But what if we not need ALL this data? What if we add -x mark to pytest parameters? Answer is we may wait for 3-5 min for metadata being collected and right after that we may fail on the 1st test. But even if we not - one almost never need all this bunch of metainfo - only the failed ones (which count usually much lesser than all input tests count), i.e. we not even need to collect metainfo before - we even not need majority of it collected anyhow (lazily or not; at least not all of it).

And yes, I saw comment in mentioned issues about design problem, etc. Just get another ping in order to remind that without collecting metadata and treating iterators lazily pytest is not scalable.

In order to bring some thoughts here (cause redesign is always really hard and maybe here are some workarounds as well) maybe follow-ups might be somehow internally reimplemented for iterators without need to do major rewriting:
For now I am using some workaround like this:

datas = gen_data()  # data generator
@pytest.fixture(scope='module', params=len_of_datas_if_known)
def fix():
    huge_data_chunk = next(datas)
    return huge_data_chunk


@pytest.mark.parametrize('other_param', ['aaa', 'bbb'])
def test_one(fix, other_param):
    data = fix
    ...

So I'm still parametrizing fixture but over "indexes" and just in order to say pytest how many times do retest with new data. It's ugly but for now it works) Also need somehow knew count of generated data (but if we exceed this count - just will get pytests fails with StopIteration error which might be marked as passed).

Another approach is to use fixtures as factories - but here as I understand we will iterate over data inside test, so for pytest it will be as single test which is not good.

And last what I've devised here for now is to use pytest.main() in loop. Something like so:

# data_generate
# set_up test
pytest.main(['test'])
@nicoddemus
Copy link
Member

Hi @BeforeFlight,

Thanks for the thoughtful writing. Just to mention, did you try pytest-subtests yet? There are some issues with it (I should get back to it at some point), but it might be an alternative for some use cases where parametrization is not desirable.

@BeforeFlight
Copy link
Author

@nicoddemus no I haven't. Will try it thanks. And maybe find something to add here.

@Zac-HD Zac-HD added topic: parametrize related to @pytest.mark.parametrize type: performance performance or memory problem/improvement type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature labels Jul 25, 2019
@BeforeFlight
Copy link
Author

I've found another way to save more RAM. Strangely I have not done it before cause it is the simplest one. Move some parametrization inside tests. Example:

@pytest.mark.parametrize("one", list_1)
@pytest.mark.parametrize("two", list_2)
def test_maybe_convert_objects(self, one, two):
    ...

Change to:

@pytest.mark.parametrize("one", list_1)
def test_maybe_convert_objects(self, one):
    for two in list_2:
        ...

It's similar to factories but even more easy to implement. Also it not only reduce RAM multiple times but time for collecting metainfo as well. Drawbacks here - for pytest it would be one test for all two values. And it works smoothly with "simple" tests - if one have some special xmarks inside or something there might be problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: parametrize related to @pytest.mark.parametrize type: performance performance or memory problem/improvement type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature
Projects
None yet
Development

No branches or pull requests

3 participants