-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Existing failed job is restarted, even if new job contains different job parameters [BATCH-2711] #882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tonie commented I traced this back to Spring boot. Version 2.0.0.M5 used the getNextJobParameters method in the JobLauncherCommandLineRunner class. Version 2.0.0.M6 changed JobLauncherCommandLineRunner to use the getNextJobParameters method in the JobParametersBuilder class. Now, the question is: Which one of these are the correct design and how do you achieve the same result as it was in version 2.0.0.M5 in the later versions? |
Michael Minella commented Can you please provide the command you're using to launch your job? The difference you're describing was just moving the code from one place to another. The code itself should be the same. |
Tonie commented I am using the following commands to start the job: java -jar myjar.jar parm=0 fileName=error.csv Second job that restart the first job instead of starting a new job instance: java -jar myjar.jar parm=1 fileName=new.csv The code was moved from one place to another, but it also changed. Code in version 2.0.0.M5: JobLauncherCommandLineRunner execute method: JobParameters nextParameters = getNextJobParameters(job, jobParameters)
JobExecution execution = this.jobLauncher.run(job, nextParameters); JobLauncherCommandLineRunner getNextJobParameters method: else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
// Retry a failed or stopped execution
parameters = previousExecution.getJobParameters();
// Non-identifying additional parameters can be removed to a retry
removeNonIdentifying(additionals);
}
else if (incrementer != null) {
// New instance so increment the parameters if we can
parameters = incrementer.getNext(previousExecution.getJobParameters());
}
}
return merge(parameters, additionals); JobLauncherCommandLineRunner merge method: private JobParameters merge(JobParameters parameters,
Map<String, JobParameter> additionals) {
Map<String, JobParameter> merged = new HashMap<String, JobParameter>();
merged.putAll(parameters.getParameters());
merged.putAll(additionals);
return new JobParameters(merged);
} Code in version 2.0.0.M6: JobLauncherCommandLineRunner execute method: JobParameters nextParameters = new JobParametersBuilder(jobParameters,
this.jobExplorer).getNextJobParameters(job).toJobParameters();
JobExecution execution = this.jobLauncher.run(job, nextParameters); JobParametersBuilder getNextJobParameters method: else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
// Retry a failed or stopped execution
nextParameters = previousExecution.getJobParameters();
// Non-identifying additional parameters can be removed to a retry
removeNonIdentifying(this.parameterMap);
}
else if (incrementer != null) {
// New instance so increment the parameters if we can
nextParameters = incrementer.getNext(previousExecution.getJobParameters());
}
}
this.parameterMap = addJobParameters(nextParameters).toJobParameters().getParameters(); JobParametersBuilder addJobParameters method: public JobParametersBuilder addJobParameters(JobParameters jobParameters) {
Assert.notNull(jobParameters, "jobParameters must not be null");
this.parameterMap.putAll(jobParameters.getParameters());
return this;
} As you can see, the old code will use the new job parameters (as it is different from the previous job), while the new code will use the previous job's job parameters. I debugged this to confirm it is indeed happening like this. |
Tonie commented I changed the addJobParameters method of the JobParametersBuilder class in the latest version of Spring Boot 2.0.1 with Spring batch core 4.0.1 as follow: public JobParametersBuilder addJobParameters(JobParameters jobParameters) {
Assert.notNull(jobParameters, "jobParameters must not be null");
Map<String, JobParameter> currentParameterMap = new LinkedHashMap<>();
currentParameterMap.putAll(this.parameterMap);
this.parameterMap.clear();
this.parameterMap.putAll(jobParameters.getParameters());
this.parameterMap.putAll(currentParameterMap);
return this;
} I first ran a job as follow: java -jar myjar.jar parm=0 fileName=error.csv This resulted in an error (as intended) and the job in the Spring meta-data tables showed a status of FAILED. I then ran a second job as follow: java -jar myjar.jar parm=1 fileName=new.csv This started the second job with a new job instance and it completed successfully. I then fixed the data file with the error for the first job and ran it again with: java -jar myjar.jar parm=0 fileName=error.csv This restarted the first job and it completed successfully. With this change the behaviour is the same as in version 2.0.0.M5. Please advise with next steps on this. |
Tonie commented While "playing" with different scenarios in order to understand what is happening with this issue, I also found the following 2 side effects of this issue:
I have decided not to use the default Spring Boot logic, but to write my own custom job launcher and job parameters processing logic for now, but are willing to help to resolve this issue if more information is required. |
Michael Minella commented I'm looking at this and while the original move was just to simplify the boot code and provide this for any user, I can't help but feel that this is wrong anyways. This code assumes the run after a failed |
Tonie commented I totally agree with your overall analysis and statement:
If I look at the documentation for Spring batch, I see the following playing a role during the job start logic:
I feel that Spring Boot should implement the use case as follow: If the new job contains exactly the same job parameters as a previous failed job:
If the new job contains different job parameters from a previous failed job:
If the new job contains the same job parameters as a previous failed job, but also specifies an incrementer:
If the new job contains the same job parameters as a previous failed job, but also contains an isRestartable indicator set to false:
Does this make sense? |
Minhyeok Jeong commented The root cause of this issue is in change of merge method. In previous version, 3.0.9.RELEASE, existing parameters and new parameters are merged like below:
In current version, 4.0.1.RELEASE:
It happens when the previous job is either failed or stopped.
But was this change intended? Isn't it misbehavior? I think the addition of a new public method-addJobParameters(JobParameters)- to the builder was a good thing. But it needs more test case to verify if the builder still keeps previous behavior without any side effect. I made a PR for this issue: #630 Please review the new test case in the PR: @Test
public void testAddingExistingJobParametersIfAbsent() {
JobParameters params1 = new JobParametersBuilder()
.addString("foo", "bar")
.addString("bar", "baz")
.toJobParameters();
JobParameters params2 = new JobParametersBuilder()
.addString("foo", "baz")
.toJobParameters();
JobParameters finalParams = new JobParametersBuilder()
.addString("baz", "quix")
.addJobParametersIfAbsent(params1)
.addJobParametersIfAbsent(params2)
.toJobParameters();
assertEquals("bar", finalParams.getString("foo"));
assertEquals("baz", finalParams.getString("bar"));
assertEquals("quix", finalParams.getString("baz"));
}
I hope this issue will be resolved as soon as possible. |
Aritz Bastida commented Hello! Does the PR really solve the issue? In my opinion, it just hides it a bit deeper. Note that, when the last execution fails, the non-identifying parameters are removed from the input map (probably to relaunch the previous execution as-it-was). Now the resulting map will contain a mix of externally provided parameters (identifying ones) and previous execution parameters (non-identifying ones). else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
// Retry a failed or stopped execution
nextParameters = previousExecution.getJobParameters();
// Non-identifying additional parameters can be removed to a retry
removeNonIdentifying(this.parameterMap);
}
else if (incrementer != null) {
// New instance so increment the parameters if we can
nextParameters = incrementer.getNext(previousExecution.getJobParameters());
}
}
this.parameterMap = addJobParametersIfAbsent(nextParameters)
.toJobParameters()
.getParameters(); On the other hand, there might as well be a design problem in JobLauncherCommandLineRunner (spring-boot-autoconfigure), which systematically obtains the next parameters, no matter what. As reported in this issue, these two classes (JobParameterBuilder and JobLauncherCommandLineRunner) in tandem make it impossible to launch a new instance without first completing the previous job execution. For the same reasons, a previous failed execution can't be restarted either. protected void execute(Job job, JobParameters jobParameters) {
JobParameters nextParameters = new JobParametersBuilder(jobParameters,
this.jobExplorer).getNextJobParameters(job).toJobParameters();
JobExecution execution = this.jobLauncher.run(job, nextParameters);
} Compare this to the old semantics from Spring Batch 3 + Spring Batch Admin. Back then, the externally provided job parameters were used to find a matching job instance. If it existed and it was restartable, the call to getNext() was omitted. Otherwise, the incrementer was called, but using the externally provided job parameters as input. In Spring Batch 4, an empty JobParameters instance will be passed to the first execution, and the previous execution's parameters otherwise. The job parameters specified at launch-time are not propagated as getNextParameters() arguments. Somehow, the semantics have changed: For better? Is it intentional? public JobExecution launch(final String jobName, JobParameters jobParameters) {
final Job job = this.jobLocator.getJob(jobName);
final JobExecution lastJobExecution = this.jobRepository.getLastJobExecution(jobName, jobParameters);
boolean restart = false;
if (lastJobExecution != null) {
final BatchStatus status = lastJobExecution.getStatus();
if (status.isUnsuccessful() && status != BatchStatus.ABANDONED) {
restart = true;
}
}
if (job.getJobParametersIncrementer() != null && !restart) {
jobParameters = job.getJobParametersIncrementer().getNext(jobParameters);
}
JobExecution jobExecution = jobLauncher.run(job, jobParameters); To summarize, I'd say that the job launcher should allow the following use cases:
|
Aritz Bastida commented In order to satisfy the use cases I mentioned above, I've revised the code for both JobLauncherCommandLineRunner and JobLauncherCommandLineRunner, and implement my own "custom" versions. Now, you can do any of the following:
The call to getNextParameters() will only be made in the latter case, which is the only situation where it makes sense. The incrementer is based on the latest job execution to calculate the next parameters (similar to a database sequence). Note that the parameters modified in this method override the provided ones (typically via command line arguments). The addJobParametersIfAbsent() method is no longer necessary, because this was only necessary for a restart. Any attempt to launch a previously completed job instance or a non-restartable one is detected by JobLauncher (with JobInstanceAlreadyCompleteException and JobRestartException, respectively), so it need not be checked in JobLauncherCommandLineRunner or JobParametersBuilder. All in all, I think that the separation of concerns are clearer with this design, although, if accepted, the change would impact at least three projects (spring-batch, spring-boot and spring-cloud-task). Feedback is welcome! By the way, I'm assuming that JobLauncherCommandLineRunner is the preferred option instead of the older CommandLineJobRunner. Am I right?
JobLauncherCommandLineRunner: protected void execute(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, JobParametersNotFoundException {
JobExecution matchingExecution = jobRepository.getLastJobExecution(job.getName(), jobParameters);
if (matchingExecution == null) { //New job instance, so get next parameters
jobParameters = new JobParametersBuilder(jobParameters, this.jobExplorer).getNextJobParameters(job).toJobParameters();
}
JobExecution execution = this.jobLauncher.run(job, jobParameters);
if (this.publisher != null) {
this.publisher.publishEvent(new JobExecutionEvent(execution));
}
} JobParametersBuilder: public CustomJobParametersBuilder getNextJobParameters(Job job) {
if (this.jobExplorer == null) {
throw new IllegalStateException("A JobExplorer is required to get next job parameters");
}
JobParametersIncrementer incrementer = job.getJobParametersIncrementer();
if (incrementer == null) {
return this;
}
JobExecution prevExec = getPreviousJobExecution(job);
JobParameters prevParams = (prevExec != null) ? prevExec.getJobParameters() : new JobParameters();
JobParameters nextParams = incrementer.getNext(prevParams);
this.parameterMap = addJobParameters(nextParams).toJobParameters().getParameters();
return this;
}
private JobExecution getPreviousJobExecution(Job job) {
List<JobInstance> lastInstances = this.jobExplorer.getJobInstances(job.getName(), 0, 1);
if (lastInstances.isEmpty()) {
return null;
}
List<JobExecution> previousExecutions = this.jobExplorer.getJobExecutions(lastInstances.get(0));
if (previousExecutions.isEmpty()) {
return null;
}
return previousExecutions.get(0);
}
|
Minhyeok Jeong commented Aritz Bastida, Thanks for your long explanation. I've been watching your lots of editing. As you mentioned, this PR does not care about non-identifying parameters. But I agree that lost of non-identifying parameters is another problem. Although you suggest some customized codes for JobLauncherCommandLineRunner and JobParametersBuilder in different point of view, I don't think it is such a complicated issue. If the problem is the lost of non-identifying parameter then we can simply fix it with just removing public JobParametersBuilder getNextJobParameters(Job job) {
if (lastInstances.isEmpty()) {
...
}
else {
...
else if (isStoppedOrFailed(previousExecution) && job.isRestartable()) {
// Retry a failed or stopped execution
nextParameters = previousExecution.getJobParameters();
////////// DELETE THE LINE BELOW
// Non-identifying additional parameters can be removed to a retry
removeNonIdentifying(this.parameterMap);
//////////
}
}
...
} Then you can restart a failed job with new non-identifying parameters. But I'm not sure whether this is intentional behavior, since, at least, Spring team intentionally added this method (removeNonIdentifying()). Documentation about non-identifying parameters are not enough in official site, so I want them to add more explanations about non-identifying parameters and its behavior, especially when the batch restarts. Another thing that I have different thought from you is that the modification of Anyway, Mahmoud Ben Hassine has read your comments and he will take a look at it. Hope the issue gets resolved soon. |
Aritz Bastida commented Hi Minhyeok Jeong! Thanks for your feedback. But, I'm sorry, I still think that the PR tries to "hide" the design problem with getNextJobParameters(), and the fix you are suggesting is an indication why. The incrementer should (only) be called to calculate the "next" parameters in new job instances, as stated in the Spring Batch documentation (JobParameterIncrementer). Now, let's take a job with the same incrementer (i.e. auto-generated "run.id" parameter) as in the SB example, and suppose the following two situations:
Actually, there could be quite another side effect. In the case "run.id" was provided as command line parameters, these would take precedence from the ones calculated by the incrementer, something we probably don't want. I take the point that Spring Boot should not be to concerned too much about Spring Batch internals, but, well, a bean called JobLauncherCommandLineRunner must at least know how to launch a job, and, as it currently stands, that includes calculating next parameters. The problem is it always calls getNextJobParameters() no matter whether it's a restart or not, even if only needed for new instances (as stated in the SB ref guide). It is also quite strange that the next parameters sometimes are previous parameters (i.e. when the last execution failed). To conclude, the code to find out whether it's a restart or not should be outside getNextJobParameters(), either in JobLauncherCommandLineRunner or, if we want to keep it internal to Spring Batch, within the JobLauncher itself. Something like: JobExecution execution = this.jobLauncher.run(job, jobParameters, true); ...the third parameter being a boolean called "useIncrementer". Internally, that would use the job repository to find out whether it's a restart or not, and call JobParametersBuilder.getNextJobParameters() only when necessary. In fact, SimpleJobLauncher already does the first thing, so the change would be simple enough. If you'd like to, I can contribute with a PR to prove my point. :) |
Minhyeok Jeong commented Aritz Bastida, It look like we have a different point of view on this issue. You may still argue about its name-next-, but I think it is reasonble, as you stated JobParametersIncrementer, getNextJobParameters() has a similar name. And you know the purpose of getNext(), I believe you can also accept getNextJobParameters(). (But who knows? If Spring Team would change its name or design.) Anyway, I respect your opinion and it is good to propose your solution in this place. However, since we think in a different way, it would be better to create another PR and leave the link in this jira ticket. In your explanation, you would have to PR on both spring-batch and spring-boot project. |
Aritz Bastida commented Hi Minhyeok Jeong. That's right, it seems we have a different opinion about what "next" means in Spring Batch. Historically, when using launchers like CommandLineJobRunner or JobService, the incrementer's getNext() method was only invoked for new instances, never for a restart. And there was a clear distinction: for example, CommandLineJobRunner has a explicit command line argument "-next". It seems that in Spring Boot this meaning has somehow changed. Anyway, we are deviating from the main subject. As I pointed out in my last email, there is an additional problem with the fix you are suggesting for getNextJobParameters(), namely: If the last execution failed and you want to run a new instance, then the incrementer will never be called. Conversely, if you want to restart a previously failed execution but the last execution completed, the incrementer will be called (even if not necessary). |
Aritz Bastida commented BTW, I will be glad to share a PR, when I come back from holidays in 3 weeks ;) As discussed, this issue can be solved in many ways, but somehow the first think upon a launch request must be to distinguish between restart and new instance, and for that we should use some DAO method, such as jobRepository.getLastJobExecution(). |
Mahmoud Ben Hassine commented Thank you all for this interesting analysis and constructive discussion! I'm currently working on this issue (I wrote a test suite here) and testing both PRs currently open for this problem (#625 and #630). I will share more details soon about how we would approach this issue. |
Mahmoud Ben Hassine commented There are a lot of details discussed here so I'll try to summarize things. But first, let me give some important facts that might influence the solution to this issue:
That said, there are two things that need to be addressed:
In regards to the PRs:
The method
This PR basically reverts this commit and fixes the issue, but it changes an assertion on an existing test on which I don't agree. I will continue the review of this PR on github. |
Mahmoud Ben Hassine commented After further investigation, this issue should be fixed in both Spring Batch and Spring Boot. The |
Aritz Bastida commented Hi Mahmoud Ben Hassine!! Thanks for considering all the feedback. Glad to know the issue has already been resolved. By the way, although not mentioned in your last comment, I see that this change affected also TaskJobLauncherCommandLineRunner in spring-cloud-task project, just for the record. :)
|
Tonie opened BATCH-2711 and commented
Created a Spring Batch job as described in the documentation 'Getting Started - Creating a Batch Service' (https://spring.io/guides/gs/batch-processing/) in order to show the issue. SQLServer is used as database for the meta-data tables as well as the business data table.
The job reads a .csv file with with 6 records and the ItemProcess throws an exception on the 4th record. The job then stops and the meta-data tables are updated correctly. Job parameters used for the job is: parm=0 fileName=error.csv
After this a new job is started with the same job name, but different parameters: parm=1 fileName=new.csv. What happens is that Spring Batch restart the failed job with the errors again. Even if the new job contains different parameters.
This worked perfectly as expected in version 3.0.8, but the issue occurred when upgraded to version 4.0.1.
Logs from 3.0.8 shows the following:
Logs from 4.0.1 shows the following:
Notice the difference in parameters used in SimpleJobLauncher, even if JobLauncherCommandLineRunner in both cases show the same correct parameters.
Affects: 4.0.1
Issue Links:
("is duplicated by")
Referenced from: commits ebf156d, 5328fb2
Backported to: 4.0.2
4 votes, 8 watchers
The text was updated successfully, but these errors were encountered: