Skip to content

JdbcPagingItemReader with sort key crashes in multithreaded step for empty query #3898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SebastianGoeb opened this issue Apr 29, 2021 · 3 comments
Labels
for: backport-to-4.2.x Issues that will be back-ported to the 4.2.x line in: infrastructure type: bug
Milestone

Comments

@SebastianGoeb
Copy link

SebastianGoeb commented Apr 29, 2021

Bug description
JdbcPagingItemReader cannot handle queries without results if multithreading and sorting are active together. remainingPagesSql is erroneously executed by other thread, if first thread found no results. Sind startAfterValues was never set by first thread, any sort key parameters is missing.

Environment
Java version: AdoptOpenJDK 11 (11.0.6)
Spring Batch version: spring-boot-starter-batch:2.4.3 (spring-batch-core:4.3.1)
Database: Postgres 13 (helm chart bitnami/postgres with image tags modified to 13.1.0-debian-10-r18)

Steps to reproduce

  1. Setup a chunking step that uses JdbcPagingItemReader with concurrency 2, page size 2, chunk size 2. I am following the driving query pattern in my simplified example, so only querying uuids, but this should work regardless of return type.
  2. Run the step against an empty table
  3. JdbcPagingItemReader (thread 1) executes firstPageSql, but gets no results. RowCallbackHandler is never called, so startAfterValues is never populated.
  4. JdbcPagingItemReader (thread 2) was waiting at synchronization point in AbstractPagingItemReader::doRead and is now allowed to continue. It executes doReadPage, and because page != 0 now it executes remainingPagesSql, but because startAfterValues was never populated, it can't find a value for the sort parameter (uuid in my case), and crashes.

Expected behavior
If thread 1 already got 0 results, I would expect thread 2 to either never enter doPageRead or just return an empty page without performing the actual query. In any case. I expect my step not to crash, but to simply finish with readCount == 0.

I'm not sure which entry condition is the culprit. I suspect it might be the current >= pageSize in AbstractPagingItemReader::doRead, but I'm not sure if that's intentional. Seems strange to me though, to allow another page query if current == pageSize, but there could be some finishing situations that I'm not considering.

Minimal Complete Reproducible example
Simplified snippet from my own code at work. Not quite complete, so will take some massaging to hook it up to real types, tables, etc. but get's the point across. Parameter import_timestamp is found by both firstPageSql and remainingPagesSql, whereas the parameter _uuid that is generated out of the sort condition is not found by remainingPagesSql if firstPageSql returns no results. This isn't an issue if single threaded because remainingPagesSql never runs, but if multithreaded, the other threads are already waiting at the lock and then run the remainingPagesSql anyway.

    @Bean
    public Step myStep(
            JdbcPagingItemReader<UUID> myTableUuidReader,
            Processor<UUID, String> myTableProcessor,
            ItemWriter<String> itemWriter) {
        SimpleAsyncTaskExecutor executor = new SimpleAsyncTaskExecutor();
        executor.setConcurrencyLimit(2);

        return steps.get("myStep")
                .<UUID, String>chunk(2)
                .reader(myTableUuidReader)
                .processor(myTableProcessor)
                .writer(itemWriter)
                .taskExecutor(executor)
                .throttleLimit(2)
                .build();
    }

    @JobScope
    @Bean
    public JdbcPagingItemReader<UUID> myTableUuidReader(DataSource dataSource) {
        return new JdbcPagingItemReaderBuilder<UUID>()
                .dataSource(dataSource)
                .selectClause("uuid")
                .fromClause("mytable")
                .whereClause("import_timestamp = :import_timestamp")
                .sortKeys(Map.of("uuid", ASCENDING))
                .parameterValues(Map.of(
                        "import_timestamp", someSqlTimestamp
                ))
                .rowMapper(new SingleColumnRowMapper<>())
                .pageSize(2)
                .saveState(false)
                .build();
    }
@fmbenhassine
Copy link
Contributor

Thank you for reporting this issue. Indeed, the reader fails with an empty result set in a multi-threaded step.

@fmbenhassine fmbenhassine added for: backport-to-4.2.x Issues that will be back-ported to the 4.2.x line and removed status: waiting-for-triage Issues that we did not analyse yet labels Sep 24, 2021
@fmbenhassine fmbenhassine added this to the 4.3.4 milestone Sep 24, 2021
fmbenhassine pushed a commit that referenced this issue Sep 24, 2021
On empty input, the JdbcPagingItemReader cannot derive
a start value for the sort key to be used in further
queries. For multi-threaded steps, it is thus necessary
to prevent the reader from trying to read further pages
if the first page is empty.

Issue #3898
@fmbenhassine
Copy link
Contributor

Resolved with #3959.

fmbenhassine pushed a commit that referenced this issue Sep 24, 2021
On empty input, the JdbcPagingItemReader cannot derive
a start value for the sort key to be used in further
queries. For multi-threaded steps, it is thus necessary
to prevent the reader from trying to read further pages
if the first page is empty.

Issue #3898
fmbenhassine pushed a commit that referenced this issue Sep 24, 2021
On empty input, the JdbcPagingItemReader cannot derive
a start value for the sort key to be used in further
queries. For multi-threaded steps, it is thus necessary
to prevent the reader from trying to read further pages
if the first page is empty.

Issue #3898
@KiranMohan
Copy link

Resolved with #3959.

The fix is broken now as the "startAfterValues" is never null. I think an additional check on !startAfterValues.isEmpty() is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: backport-to-4.2.x Issues that will be back-ported to the 4.2.x line in: infrastructure type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants