Skip to content

Slow SQL in JdbcStepExecutionDao on Postgres #3634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mcheban opened this issue Dec 24, 2019 · 7 comments
Closed

Slow SQL in JdbcStepExecutionDao on Postgres #3634

mcheban opened this issue Dec 24, 2019 · 7 comments
Labels
has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" related-to: performance type: enhancement
Milestone

Comments

@mcheban
Copy link
Contributor

mcheban commented Dec 24, 2019

In SQL described as constant GET_LAST_STEP_EXECUTION

SELECT SE.STEP_EXECUTION_ID,
       SE.STEP_NAME,
       SE.START_TIME,
       SE.END_TIME,
       SE.STATUS,
       SE.COMMIT_COUNT,
       SE.READ_COUNT,
       SE.FILTER_COUNT,
       SE.WRITE_COUNT,
       SE.EXIT_CODE,
       SE.EXIT_MESSAGE,
       SE.READ_SKIP_COUNT,
       SE.WRITE_SKIP_COUNT,
       SE.PROCESS_SKIP_COUNT,
       SE.ROLLBACK_COUNT,
       SE.LAST_UPDATED,
       SE.VERSION,
       JE.JOB_EXECUTION_ID,
       JE.START_TIME,
       JE.END_TIME,
       JE.STATUS,
       JE.EXIT_CODE,
       JE.EXIT_MESSAGE,
       JE.CREATE_TIME,
       JE.LAST_UPDATED,
       JE.VERSION
from BATCH_JOB_EXECUTION JE,
     BATCH_STEP_EXECUTION SE
where SE.JOB_EXECUTION_ID in (SELECT JOB_EXECUTION_ID
                              from BATCH_JOB_EXECUTION
                              where JE.JOB_INSTANCE_ID = ?)
  and SE.JOB_EXECUTION_ID = JE.JOB_EXECUTION_ID
  and SE.STEP_NAME = ?
order by SE.START_TIME desc, SE.STEP_EXECUTION_ID desc;

subquery (SELECT JOB_EXECUTION_ID from BATCH_JOB_EXECUTION where JE.JOB_INSTANCE_ID = ?) filters by JE.JOB_INSTANCE_ID which is outside of this subquery and as a result this subquery will scan the whole table and DB performs filtering by JOB_INSTANCE_ID at the very end.
The issue is only reproducible when you have millions of records in BATCH_JOB_EXECUTION

The fix is simply rewrite subquery and remove JE. – like this where JOB_INSTANCE_ID = ?

@cmsource
Copy link

I just hit the same problem after upgrading from Spring Batch 4.1.3.RELEASE to 4.2.0.RELEASE (the defective query appears to be in the latest 4.2.1.RELEASE too).

Using an embedded HSQLDB Job Repository this query is taking up to 50 seconds to run with 25k rows in BATCH_STEP_EXECUTION.

@fmbenhassine fmbenhassine added this to the 4.3.0 milestone Jan 29, 2020
@fmbenhassine fmbenhassine added the type: holder Issues that hold references to back-ported issues label Jan 29, 2020
@fmbenhassine fmbenhassine added has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" and removed type: holder Issues that hold references to back-ported issues labels Feb 7, 2020
@simi
Copy link

simi commented Apr 9, 2020

I can confirm this problem as well. The difference is 15s vs 1ms (for right query with index present) on PostgreSQL.

@fmbenhassine
Copy link
Contributor

Thank you all for your feedback! This will be included in the upcoming 4.3.0.M1 which will be aligned with Spring Framework 5.3.0.M1 and Spring Boot 2.4.0.M1. The release dates of those milestones are not fixed yet so I can't give a date for Spring Batch 4.3.0.M1 for now.

For the record, the query GET_LAST_STEP_EXECUTION was introduced in an effort of improving the performance of step partitioning (see #891) which was taking more than 4 minutes (!) on H2 with 5000 partitions.. We managed to improve that by a factor of 10 thanks to this new query.

Now if this query can be optimized even further, then of course that's welcome!

The issue is only reproducible when you have millions of records in BATCH_JOB_EXECUTION

@mcheban Thank you for reporting this issue and for opening a PR! Just curious, can you share some numbers about how many jobs do you have and at which frequency they are launched to end up with millions of records in BATCH_JOB_EXECUTION? Do you have a retention policy / archiving strategy as recommended in the docs?

@fmbenhassine
Copy link
Contributor

Resolved with #3635 .

This was referenced Apr 25, 2020
@kersale-g
Copy link

kersale-g commented May 25, 2020

Hi,
Thanks for you confirmation of the issue.
Could you confirm that until fix is available, the only "easy" solution is to move "back" to 4.1.3.RELEASE.
We identified the same issue two weeks ago with hundreds of thousands of records left in tables (different subject) and I was referencing this thread to our dev team.
Issue is with postgresql :
(SELECT JOB_EXECUTION_ID from BATCH_JOB_EXECUTION where JE.JOB_INSTANCE_ID = ?)
is executed as
(SELECT JE2.JOB_EXECUTION_ID from BATCH_JOB_EXECUTION JE2 where JE.JOB_INSTANCE_ID = ?)
as if running
(SELECT JE2.JOB_EXECUTION_ID from BATCH_JOB_EXECUTION JE2 ) without any filtering

The following sub-SQL (IN condition) looks fine:
(SELECT JE2.JOB_EXECUTION_ID from BATCH_JOB_EXECUTION JE2 where JE2.JOB_INSTANCE_ID = ?)

If the SQL is the one intended (out of my scope), prefixing any table does avoid any Database parsing/optimizer consideration, as it explicitly reference the intended table to use.

Regards,
Geoffrey.

@fmbenhassine
Copy link
Contributor

fmbenhassine commented May 25, 2020

@kersale-g

Could you confirm that until fix is available, the only "easy" solution is to move "back" to 4.1.3.RELEASE.

You can also override the JdbcStepExecutionDao as explained here: #3635 (comment).

@artsgard
Copy link

artsgard commented Oct 5, 2021

I have upgraded my spring-boot-starter-parent from 2.2.2 to 2.4.3 and the query problem is still there: running a simple select query of less than a millisecond that turns into 100/ 200 milliseconds, delaying my batch dramatically! The select query runs at the processor section of the batch, passing through all the chunks of size 500. The DB I am using is a Postgress one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" related-to: performance type: enhancement
Projects
None yet
Development

No branches or pull requests

6 participants