Skip to content

JUnit parallel executor running too many tests with a fixed policy. #3108

Open
@OPeyrusse

Description

@OPeyrusse

JUnit only parallel executor service relies on a ForkJoinPool. Unfortunately, this does not play well with code also using ForkJoinPools.

The observed issue is that, when activating parallel tests, JUnit uses ForkJoinPoolHierarchicalTestExecutorService. However, our tests are also using ForkJoinPools and ForkJoinTasks. The orchestration of the test awaits for the completion of those tasks before moving on in the tests.
But the issue is that ForkJoinTask and ForkJoinWorkerThread are capable of detecting the use of the FJP framework (here for example) and react to it. As JUnit tasks and the project tasks are in different ForkJoinPools, they cannot help each other. This only results in more tests being started by already running and incomplete tests.

This can be an issue when tests are resource sensitive. For example, it may not be possible to open too many connections to a Database.
Tough not explicitly illustrated in this project, we also faced StackOverflowError because of recursive test executions in a single worker. In the following logs, produced by the reproducing project, we can see that two workers are recursively starting tests before finishing any:

ForkJoinPool-1-worker-1 starting a new test. Running: 2
ForkJoinPool-1-worker-2 starting a new test. Running: 1
ForkJoinPool-1-worker-1 starting a new test. Running: 3
ForkJoinPool-1-worker-1 starting a new test. Running: 4
ForkJoinPool-1-worker-1 starting a new test. Running: 5
ForkJoinPool-1-worker-2 starting a new test. Running: 6
ForkJoinPool-1-worker-1 starting a new test. Running: 7
ForkJoinPool-1-worker-1 starting a new test. Running: 8
ForkJoinPool-1-worker-1 starting a new test. Running: 9
ForkJoinPool-1-worker-2 starting a new test. Running: 10
ForkJoinPool-1-worker-1 starting a new test. Running: 11
ForkJoinPool-1-worker-1 starting a new test. Running: 12
ForkJoinPool-1-worker-1 starting a new test. Running: 14
ForkJoinPool-1-worker-2 starting a new test. Running: 13
ForkJoinPool-1-worker-1 starting a new test. Running: 15
ForkJoinPool-1-worker-1 starting a new test. Running: 16
ForkJoinPool-1-worker-1 starting a new test. Running: 17
ForkJoinPool-1-worker-1 starting a new test. Running: 18
ForkJoinPool-1-worker-2 starting a new test. Running: 19
ForkJoinPool-1-worker-1 starting a new test. Running: 20

worker-1 started 15 tests, worker-2 started 5 tests

In actual code, given the location of the point of respawn, this can generate very large stacks.

An alternative implementation of the executor service, as shown in this PR, using a standard Thread Executor, would not show similar issues, at the expense of not ideally orchestrating multiple executions.

Let's note that this is a very sneaky issue. Even in this project, we can see that the call to ForkJoinTask#get is not visible in the JUnit method. And it can be as deep as we want in the stack, even hidden to some users as it is happening in a used framework or tool. It may not be always possible for a user to detect that pattern.

Steps to reproduce

See this project https://github.com/OPeyrusse/junitforkjoinpool
After building the project, run the test class TestCalculator (or look at the README if changed since opening this issue).

Context

  • Used versions (Jupiter/Vintage/Platform): 5.8.2, with the Jupiter runner
  • Build Tool/IDE: Build with Intellij, failing in Intellij and when running it with Maven
  • Maven Surefire version: 3.0.0-M5

Deliverables

  • At least one standard test executor not relying on the ForkJoinPool

Activity

OPeyrusse

OPeyrusse commented on Dec 15, 2022

@OPeyrusse
Author

Some additional thoughts:
Looking at the code and being used to work with ForkJoinPool, I can guess some benefits of using the ForkJoin framework. It would naturally focus on sub-tasks to complete the execution of a given class or parameterized tests before moving to other classes.
However, IMO, the mistake is that a ForkJoinTask should only run CPU intensive tasks. JUnit cannot know the content of tests. Some may be real unit tests, only depending on the CPU. Some may be real unit tests but reading the disk. Some may be more integration tests, making network calls using whatever framework. In those cases, for defensive measures, users of the ForkJoin framework have to rely on ManagedBlock. The drawback of Managed blocker is that they start new threads.

Yet another solution could be to use a ForkJoinPool for scheduling of the tests and a standard executor for the execution of the tests.

OPeyrusse

OPeyrusse commented on Dec 15, 2022

@OPeyrusse
Author

There is also a short snippet reproducing this behavior here.
However, the current choice was to say that the doc will be update to say the configuration limits the number of threads, not the number of tests. This issue focuses on the number of tests being run, as it is the actual problem at stake, leading to StackOverflowErrors.

OPeyrusse

OPeyrusse commented on Dec 15, 2022

@OPeyrusse
Author

To add to this, it is not currently possible to have parallel tests without being tricked by JUnit current implementation.
It is either sequential tests or parallel tests that must take care of what they do.

marcphilipp

marcphilipp commented on Apr 22, 2023

@marcphilipp
Member
OPeyrusse

OPeyrusse commented on Apr 24, 2023

@OPeyrusse
Author

Hello @marcphilipp , nope it will not help. The feature advertised was already one we looked at, when it was at the stage of PR.
If you look again, you will see that only two threads from the ForkJoinPool are being used.
Tough not tested, I would even say that with only one thread, you can reproduce the problem above. As stated in my description, it comes from the FJ framework detecting the use of ForkJoinPool in a ForkJoinThread, leading it to attempt some work instead of waiting.
So limiting the number of threads in the pool or its saturation will not resolve the issue. Here, the issue is that too many tests are started.

I can give you an analogy with the following story:
There is a factory that can manufacture cars. It takes time, more or less depending on the car models. Building a car is the analogy to running a single test. The factory has enough workers to build several cars concurrently.
There are also clerks coming to the factory with orders to manufacture cars. Those are JUnit ForkJoinTasks, produced after discovering that tests must be run.
When a clerk arrives at the factory, it delivers the order to the factory, that immediately starts working on it.
The factory has chosen, for company reasons, to operate using the FJP methodology (they are smart). The weird thing is that the clerk sees that, and because the clerk company is also using the FJP methodology, the clerk decides not to stay idle and do something while waiting for the car to be made. Obviously, the clerk cannot join the factory workers - it is not its job - so to do something, the clerk picks the next task available to him, which is go back to the office to pick another order.
Because clerks make faster trips to the office than the factory can produce cars, the factory becomes overflowed with orders - which, in the computer world, translates to memory starvation, stack overflow, etc

I hope this helps understand the logic triggering the issue 😃

61 remaining items

mufasa1

mufasa1 commented on Mar 4, 2024

@mufasa1

@Vampire

This is for Spock. I described above how you probably can do the same for Jupiter.

Hi, I tried the description your wrote, but it didn't work. I would greatly appreciate it if you had a validated solution..

Vampire

Vampire commented on Mar 4, 2024

@Vampire

I provided a full-blown example above, what more do you need?

mufasa1

mufasa1 commented on Mar 4, 2024

@mufasa1

I provided a full-blown example above, what more do you need?

I mean for Junit, not for Spock. I tried your description like this
'@OverRide
public void interceptTestMethod(Invocation invocation,
ReflectiveInvocationContext invocationContext,
ExtensionContext extensionContext) throws Throwable {
threadPool.submit(invocation::proceed).get();
}'

but it didn't work. What I need is a solution that has been validated.

Vampire

Vampire commented on Mar 4, 2024

@Vampire

Works perfectly fine for me, except that in Jupiter proceed throws an exception, so you have to handle it.
But once you made the code compilable and ensured it is used, it does exactly what is intended.

added a commit that references this issue on Dec 24, 2024
added 2 commits that reference this issue on Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @marcphilipp@Vampire@asardaes@igogoka@mufasa1

    Issue actions

      JUnit parallel executor running too many tests with a fixed policy. · Issue #3108 · junit-team/junit-framework