Skip to content

JobLauncherCommandLineRunner picks a wrong job instance. #14484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
josemalonsom opened this issue Sep 16, 2018 · 8 comments
Closed

JobLauncherCommandLineRunner picks a wrong job instance. #14484

josemalonsom opened this issue Sep 16, 2018 · 8 comments
Labels
for: external-project For an external project and not something we can fix

Comments

@josemalonsom
Copy link

josemalonsom commented Sep 16, 2018

In a project in which I am working now we are using an specific parameter as identifier and without incrementer in a restartable job. The problem that we are facing now is that when we try to launch a new job with a new identifier argument a previous job instance is retrieved, let me explain it:

  • we have a job restartable with a specific identifier, let’s say for example a date.
  • we launch the job for the date 2018-09-15 and it ends with a failed status.
  • we launch the job for a new date, for example 2018-09-16, and we expect that a new job instance for 2018-09-16 would be created, but instead the job instance identified by 2018-09-15 is picked up and the identifier argument is changed to 2018-09-15.

I am not sure if I am missing something about the expected behavior but I would expect that the failed instance 2018-09-15 would be picked up only if we launch the batch with that date as argument but not that by default if a failed job instance exists that job instance is the one that is to be running in subsequent executions independently of the new arguments passed to the job.

I have tracked the problem to here: JobLauncherCommandLineRunner, it seems that the JobLauncherCommandLineRunner uses the JobParametersBuilder from Spring Batch to pick the job parameters, but the problem is that JobParametersBuilder.getNextJobParameters() doesn’t care about the identifier arguments of the job that it’s being launched, instead, in case of a previous failed job it picks this job instance assuming that you want to continue the previous job instance and without taking into account that you may want to launch a different instance, you can see it here: JobParametersBuilder.

I am not sure if the problem is in Spring Batch’s JobParametersBuilder or Spring Boot’s JobLauncherCommandLineRunner, but I think that the behavior of JobParametersBuilder.getNextJobParameters() makes sense if you want to act given the previous job state but maybe JobLauncherCommandLineRunner shouldn’t assume this because it is not always the case.

As I said maybe I am misunderstanding something but I think there is a bug here.

Spring Boot version: 2.0.3.RELEASE.
Spring Batch core version: 4.0.1.RELEASE.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Sep 16, 2018
@philwebb
Copy link
Member

I'm not all that familiar with the internals of Spring Batch, Perhaps @benas or @mminella can comment?

@fmbenhassine
Copy link
Contributor

There is an open issue in batch for this problem: https://jira.spring.io/browse/BATCH-2741. I was able to reproduce the issue but I'm not sure yet if the fix should go in batch or boot. I'll get back to you asap in regards to this.

@philwebb
Copy link
Member

Thanks very much @benas. I'll close this one for now but please add a comment here if we need to fix it in Boot.

@philwebb philwebb added for: external-project For an external project and not something we can fix and removed status: waiting-for-triage An issue we've not yet triaged labels Sep 17, 2018
@fmbenhassine
Copy link
Contributor

Hi,

This is a batch issue and we are planning to fix it in Spring Batch v4.1 GA. I was about to suggest to close this issue against boot.

@josemalonsom Please follow updates about this problem in https://jira.spring.io/browse/BATCH-2741 and https://jira.spring.io/browse/BATCH-2711.

Kind regards,
Mahmoud

@josemalonsom
Copy link
Author

josemalonsom commented Sep 18, 2018

Ok, thank you.

@fmbenhassine
Copy link
Contributor

fmbenhassine commented Oct 23, 2018

Hi,

After further investigations, we will need to apply a fix on both sides. We found that the code that moved to batch in #10135 has a couple of issues:

Issue 1: Checking the status of the last instance

The code checks the status of the last instance regardless of the provided parameters. It should instead check the status of the last instance for the given parameters. Here is a failing test (should be added in JobLauncherCommandLineRunnerTests) that reproduces the reported issue here with the current master (as of 45a0bfd):

@Test
public void runDifferentInstances() throws Exception {
	this.job = this.jobs.get("job")
			.start(this.steps.get("step").tasklet(throwingTasklet()).build()).build();
	// start a job instance
	JobParameters jobParameters = new JobParametersBuilder().addString("name", "foo")
			.toJobParameters();
	this.runner.execute(this.job, jobParameters);
	assertThat(this.jobExplorer.getJobInstances("job", 0, 100)).hasSize(1);
	// start a different job instance
	JobParameters otherJobParameters = new JobParametersBuilder()
			.addString("name", "bar").toJobParameters();
	this.runner.execute(this.job, otherJobParameters);
	assertThat(this.jobExplorer.getJobInstances("job", 0, 100)).hasSize(2);
}

Issue 2: Handling non-identifying parameters

When the last execution of a job instance fails, non-identifying parameters are removed from the current parameters while they should be removed from the last execution’s parameters. The idea is to make it possible to modify non-identifying parameters when we want to restart a failed execution. If we remove non-identifying parameters from the newly provided ones, we cannot achieve the desired feature. Here is a failing test with the current master:

@Test
public void retryFailedExecutionWithDifferentNonIdentifyingParametersFromPreviousExecution()
		throws Exception {
	this.job = this.jobs.get("job")
			.start(this.steps.get("step").tasklet(throwingTasklet()).build())
			.incrementer(new RunIdIncrementer()).build();
	JobParameters jobParameters = new JobParametersBuilder().addLong("id", 1L, false)
			.addLong("foo", 2L, false).toJobParameters();
	this.runner.execute(this.job, jobParameters);
	assertThat(this.jobExplorer.getJobInstances("job", 0, 100)).hasSize(1);
	// try to re-run a failed execution with non identifying parameters
	this.runner.execute(this.job, new JobParametersBuilder().addLong("run.id", 1L)
			.addLong("id", 2L, false).addLong("foo", 3L, false).toJobParameters());
	assertThat(this.jobExplorer.getJobInstances("job", 0, 100)).hasSize(1);
	JobInstance jobInstance = this.jobExplorer.getJobInstance(0L);
	assertThat(this.jobExplorer.getJobExecutions(jobInstance)).hasSize(2);
	// first execution
	JobExecution firstJobExecution = this.jobExplorer.getJobExecution(0L);
	JobParameters parameters = firstJobExecution.getJobParameters();
	assertThat(parameters.getLong("run.id")).isEqualTo(1L);
	assertThat(parameters.getLong("id")).isEqualTo(1L);
	assertThat(parameters.getLong("foo")).isEqualTo(2L);
	// second execution
	JobExecution secondJobExecution = this.jobExplorer.getJobExecution(1L);
	parameters = secondJobExecution.getJobParameters();
	// identifying parameters should be the same as previous execution
	assertThat(parameters.getLong("run.id")).isEqualTo(1L);
	// non-identifying parameters should be the newly specified ones
	assertThat(parameters.getLong("id")).isEqualTo(2L);
	assertThat(parameters.getLong("foo")).isEqualTo(3L);
}

Now we could fix these two issues in batch (and that’s what we did the first time), but this will introduce an inconsistency in batch due to a difference in the behaviour between boot and batch in regards to the JobParametersIncrementer: Boot automatically calls the incrementer while batch does not. In batch, in order to start the next instance of a job (and call the incrementer), we need to:

  • explicitly specify the “-next” option when using the CommandLineJobRunner
  • explicitly call the startNextInstance method when using the JobOperator
  • manually increment the parameters (explicit action) and pass them to the JobLauncher

So if the fix goes in batch only, the method JobParametersBuilder#getNextJobParameters will automatically call the incrementer and will be inconsistent with the CommandLineJobRunner and JobOperator (which do not automatically call the incrementer).

In sum, to correctly fix the issue reported here and after discussing these details with @mminella:

  • I opened a PR for batch to make the JobParametersBuilder#getNextJobParameters consistent with the CommandLineJobRunner and JobOperator.
  • I opened a PR for Boot to fix the couple of problems mentioned above

Obviously these two PRs should be merged together. We planned the batch fix to be included in 4.1 GA, so I hope it will be possible to merge the boot PR for 2.1 GA.

Kr,
Mahmoud

@philwebb
Copy link
Member

Thanks @benas! If the Batch PR gets merged soon we can apply the Boot fix before GA

@fmbenhassine
Copy link
Contributor

Great! Thank you @philwebb ! I will let you know once the batch PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants