Create an ItemReader that reads from an InputStream [BATCH-2695] #912

spring-projects-issues · 2018-03-01T18:40:13Z

Michael Minella opened BATCH-2695 and commented

A regular request is to be able to read S3 files without downloading them first. In order to do this a reader would need to be created to read from an InputStream instead of a local file. This is to explore a mechanism to do so.

Affects: 4.0.0

Reference URL: https://stackoverflow.com/questions/30832041/spring-batch-read-files-from-aws-s3

Issue Links:

BATCH-2709 APIItemReader

Backported to: 4.1.0.M3

The text was updated successfully, but these errors were encountered:

spring-projects-issues · 2018-03-01T18:48:19Z

Gary Russell commented

Michael Minella spring-integration-aws already has a S3StreamingMessageSource.

It uses the S3RemoteFileTemplate.

cc/ Artem Bilan

spring-projects-issues · 2018-03-01T18:59:31Z

Artem Bilan commented

You just need a simple code like this:

InputStream s3ObjectInputStream = this.amazonS3.getObject(bucketName, key).getObjectContent();

spring-projects-issues · 2018-08-21T10:26:55Z

Mahmoud Ben Hassine commented

Thank you Gary Russell and Artem Bilan.

Michael Minella As discussed, I first tried to see if It's possible to read data from a URL without downloading it using a URLResource. The answer is yes. To test that, I uploaded a flat file (with random data, 1M records approx 50Mb) to S3 and wrote the following test:

@Test
public void testReadDataFromS3() throws Exception {
	// given
	UrlResource resource = new UrlResource("https://s3.eu-west-3.amazonaws.com/benas-data/data.csv");
	FlatFileItemReader<String> itemReader = new FlatFileItemReaderBuilder<String>()
			.name("dataReader")
			.resource(resource)
			.lineMapper(new PassThroughLineMapper())
			.build();

	// when
	int itemCount = 0;
	itemReader.open(new ExecutionContext());
	while (itemReader.read() != null) {
		itemCount++;
	}
        itemReader.close();

	// then
	Assert.assertEquals(1000000, itemCount);
}

which is passing. The file is not downloaded locally and is streamed directly from S3.

The good news is that all file readers in Spring Batch (FlatFileItemReader, StaxEventItemReader and JsonItemReader) are based on the (powerful!) Resource abstraction, so it's possible to read not only flat files but also XML and JSON files from a specific URL (Our XML and JSON tests are passing when reading data directly from Github, see here).

One important part of this user story is we need to make sure that Spring Batch mechanics (skip, restart, etc) are still valid when streaming data from a URL. I wrote a test suite for these features here and it is passing too.

spring-projects-issues · 2018-08-31T17:27:24Z

Mahmoud Ben Hassine commented

Works as designed with a URLResource.

spring-projects-issues · 2018-08-31T17:30:58Z

Artem Bilan commented

Does this mean that for plain, in-memory InputStream that would be just enough for me to wrap it into the InputStreamResource and reuse a mentioned FlatFileItemReaderBuilder?

Is it documented somehow?

Thanks

spring-projects-issues · 2018-08-31T18:01:56Z

Mahmoud Ben Hassine commented

Artem Bilan yes, that should work. The documentation states that the reader expects a SF Resource and links to SF docs, so any Resource implementation should work.

spring-projects-issues closed this as completed Oct 18, 2019

spring-projects-issues added type: feature status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" in: infrastructure labels Dec 16, 2019

spring-projects-issues added this to the 4.1.0 milestone Dec 16, 2019

spring-projects-issues mentioned this issue Dec 17, 2019

4.1.0.M3 Backported Issues #3588

Closed

fmbenhassine mentioned this issue Apr 13, 2021

Add S3 ItemReader/ItemWriter #3818

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

spring-projects-issues commented Mar 1, 2018

spring-projects-issues commented Mar 1, 2018

Uh oh!

spring-projects-issues commented Mar 1, 2018

Uh oh!

spring-projects-issues commented Aug 21, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

Comments

spring-projects-issues commented Mar 1, 2018

spring-projects-issues commented Mar 1, 2018

Uh oh!

spring-projects-issues commented Mar 1, 2018

Uh oh!

spring-projects-issues commented Aug 21, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!

spring-projects-issues commented Aug 31, 2018

Uh oh!