-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Slow SQL in JdbcStepExecutionDao on Postgres #3634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just hit the same problem after upgrading from Spring Batch 4.1.3.RELEASE to 4.2.0.RELEASE (the defective query appears to be in the latest 4.2.1.RELEASE too). Using an embedded HSQLDB Job Repository this query is taking up to 50 seconds to run with 25k rows in BATCH_STEP_EXECUTION. |
I can confirm this problem as well. The difference is 15s vs 1ms (for right query with index present) on PostgreSQL. |
Thank you all for your feedback! This will be included in the upcoming 4.3.0.M1 which will be aligned with Spring Framework 5.3.0.M1 and Spring Boot 2.4.0.M1. The release dates of those milestones are not fixed yet so I can't give a date for Spring Batch 4.3.0.M1 for now. For the record, the query Now if this query can be optimized even further, then of course that's welcome!
@mcheban Thank you for reporting this issue and for opening a PR! Just curious, can you share some numbers about how many jobs do you have and at which frequency they are launched to end up with millions of records in BATCH_JOB_EXECUTION? Do you have a retention policy / archiving strategy as recommended in the docs? |
Resolved with #3635 . |
Hi, The following sub-SQL (IN condition) looks fine: If the SQL is the one intended (out of my scope), prefixing any table does avoid any Database parsing/optimizer consideration, as it explicitly reference the intended table to use. Regards, |
You can also override the |
I have upgraded my spring-boot-starter-parent from 2.2.2 to 2.4.3 and the query problem is still there: running a simple select query of less than a millisecond that turns into 100/ 200 milliseconds, delaying my batch dramatically! The select query runs at the processor section of the batch, passing through all the chunks of size 500. The DB I am using is a Postgress one. |
In SQL described as constant GET_LAST_STEP_EXECUTION
subquery
(SELECT JOB_EXECUTION_ID from BATCH_JOB_EXECUTION where JE.JOB_INSTANCE_ID = ?)
filters byJE.JOB_INSTANCE_ID
which is outside of this subquery and as a result this subquery will scan the whole table and DB performs filtering byJOB_INSTANCE_ID
at the very end.The issue is only reproducible when you have millions of records in
BATCH_JOB_EXECUTION
The fix is simply rewrite subquery and remove
JE.
– like thiswhere JOB_INSTANCE_ID = ?
The text was updated successfully, but these errors were encountered: