Skip to content

scheduler: Stop scheduling from the imperative queue when asked to quit #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abderrahim
Copy link
Contributor

For instance, we shouldn't schedule new build jobs when asked to quit. However, not considering the build queue at all would miss pushing newly built elements.

What this commit does is:

  • Pull elements forward through all queues (including the ones we no longer need)
  • Only schedule jobs after the imperative queue

This means that elements in the build queue (or any imperative queue) will still get passed to the following queues, but no new build jobs get scheduled.

Fixes #1787

For instance, we shouldn't schedule new build jobs when asked to
quit. However, not considering the build queue at all would miss
pushing newly built elements.

What this commit does is:
* Pull elements forward through all queues (including the ones we no
longer need)
* Only schedule jobs after the imperative queue

This means that elements in the build queue (or any imperative queue)
will still get passed to the following queues, but no new build jobs
get scheduled.

Fixes #1787
@@ -371,7 +371,7 @@ def _sched_queue_jobs(self):

# Pull elements forward through queues
elements = []
for queue in queues:
for queue in self.queues:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks very dubious.

The first part of the function's job is to select the queues on which to process in the latter half of the function, that seems to make sense. And except for this line you've changed, the rest of the function does only operate on the selected queues.

I wonder if this change is really needed, or if only the previous change is needed.

Given this is such a sensitive code block, I think this change needs further clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of the function's job is to select the queues on which to process in the latter half of the function, that seems to make sense. And except for this line you've changed, the rest of the function does only operate on the selected queues.

Correct, and that's the root cause of the issue I'm trying to fix.

There are two operations in this function ("pull elements forward through queues", and "kickoff whatever processes can be processed at this time") and they shouldn't operate on the same list of queues.

Let's consider a simple example where we have three queues: fetch, build and push. Once we receive the quit signal, we only want to:

  • pull elements forward from build to push
  • kickoff new jobs in push

What the current code does is:

  • pull elements forward from build to push
  • kickoff new jobs in build (<- not desired, and is the issue I'm trying to fix)
  • kickoff new jobs in push

What the new code does is:

  • pull elements forward from fetch to build (<- undesired but harmless, this is the issue you're pointing)
  • pull elements through from build to push
  • kickoff new jobs in push

I hope this explains it. So while this line is indeed "dubious", it needs to be different from the other one. And since it's harmless to pull jobs into next queues even when asked to quit, I think it's fine to do that unconditionally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... I've done some mental gymnastics and I agree with your statement :)

What I would like to see here is:

  • Rename the local queues variable to something more explicit, perhaps queues_to_process
  • At least slightly elaborate on the comment "Pull elements forward through queues"
    • Perhaps something like "Pull elements forward through all queues, regardless of whether we are processing those queues"
    • Maybe even a little explanation, like "We need to propagate elements through all queues even if we are not processing post-imperative queues, so that we do end up processing the jobs we need to"

@gtristan
Copy link
Contributor

Can we add a test here ?

I think that we can potentially test correct behavior, as described in the comments of #1787, by constructing a test which:

  • Has an artifactshare
  • Has a build which fails
  • Test that the failed build exists in the artifact share after the failed bst build call
    • And/or use the cli.get_pushed_elements() checker to also verify the build log

I think we already have a similar test in tests/integration/cachedfail.py::test_push_cached_fail(), perhaps this requires constructing a dependency chain which fails the existing test but passes with this change applied ?

@abderrahim
Copy link
Contributor Author

I think we already have a similar test in tests/integration/cachedfail.py::test_push_cached_fail(), perhaps this requires constructing a dependency chain which fails the existing test but passes with this change applied ?

I think there is a misunderstanding here. What this test does (and what makes me confident that my changes don't break anything) is test for correctness "buildstream pushes the artifacts it has built before quitting". While my change is about interactivity "buildstream quits ASAP when asked to".

I don't think I can reliably test for the interactivity part as buildstream isn't deterministic (or at least it isn't guaranteed to be deterministic) in the order in which it schedules ready jobs. How I would test this manually is to have two build jobs that would run in parallel, set --builders to 1, then interrupt the build and request quit during the build of the first element that gets scheduled.

Talking this through, I think we might be able to do the same thing with failed builds: have two elements that fail, launch the build with --on-error quit, and assert that only one of two elements gets built and pushed. I'll give it a go.

@gtristan
Copy link
Contributor

gtristan commented May 18, 2025

Talking this through, I think we might be able to do the same thing with failed builds: have two elements that fail, launch the build with --on-error quit, and assert that only one of two elements gets built and pushed. I'll give it a go.

Thats what I was thinking (at least the --on-error quit part), also, to control the behavior, we probably want an explicit --builders (possibly --builders 1 ?) to constrain buildstream to build a constrained amount of elements in parallel.

Sorry, I read the first and last part of what you wrote, and then reading the middle part I can see you thought of all of that already ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BuildStream doesn't quit on ctrl+c then quit
2 participants