Skip to content

Create an ItemReader that reads from an InputStream [BATCH-2695] #912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Mar 1, 2018 · 6 comments
Labels
has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" in: infrastructure status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details type: feature
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

Michael Minella opened BATCH-2695 and commented

A regular request is to be able to read S3 files without downloading them first. In order to do this a reader would need to be created to read from an InputStream instead of a local file. This is to explore a mechanism to do so.


Affects: 4.0.0

Reference URL: https://stackoverflow.com/questions/30832041/spring-batch-read-files-from-aws-s3

Issue Links:

Backported to: 4.1.0.M3

@spring-projects-issues
Copy link
Collaborator Author

Gary Russell commented

Michael Minella spring-integration-aws already has a S3StreamingMessageSource.

It uses the S3RemoteFileTemplate.

cc/ Artem Bilan

@spring-projects-issues
Copy link
Collaborator Author

Artem Bilan commented

You just need a simple code like this:

InputStream s3ObjectInputStream = this.amazonS3.getObject(bucketName, key).getObjectContent();

@spring-projects-issues
Copy link
Collaborator Author

Mahmoud Ben Hassine commented

Thank you Gary Russell and Artem Bilan.

Michael Minella As discussed, I first tried to see if It's possible to read data from a URL without downloading it using a URLResource. The answer is yes. To test that, I uploaded a flat file (with random data, 1M records approx 50Mb) to S3 and wrote the following test:

@Test
public void testReadDataFromS3() throws Exception {
	// given
	UrlResource resource = new UrlResource("https://s3.eu-west-3.amazonaws.com/benas-data/data.csv");
	FlatFileItemReader<String> itemReader = new FlatFileItemReaderBuilder<String>()
			.name("dataReader")
			.resource(resource)
			.lineMapper(new PassThroughLineMapper())
			.build();

	// when
	int itemCount = 0;
	itemReader.open(new ExecutionContext());
	while (itemReader.read() != null) {
		itemCount++;
	}
        itemReader.close();

	// then
	Assert.assertEquals(1000000, itemCount);
}

which is passing. The file is not downloaded locally and is streamed directly from S3.

The good news is that all file readers in Spring Batch (FlatFileItemReader, StaxEventItemReader and JsonItemReader) are based on the (powerful!) Resource abstraction, so it's possible to read not only flat files but also XML and JSON files from a specific URL (Our XML and JSON tests are passing when reading data directly from Github, see here).

One important part of this user story is we need to make sure that Spring Batch mechanics (skip, restart, etc) are still valid when streaming data from a URL. I wrote a test suite for these features here and it is passing too.

@spring-projects-issues
Copy link
Collaborator Author

Mahmoud Ben Hassine commented

Works as designed with a URLResource.

@spring-projects-issues
Copy link
Collaborator Author

Artem Bilan commented

Does this mean that for plain, in-memory InputStream that would be just enough for me to wrap it into the InputStreamResource and reuse a mentioned FlatFileItemReaderBuilder?

Is it documented somehow?

Thanks

@spring-projects-issues
Copy link
Collaborator Author

Mahmoud Ben Hassine commented

Artem Bilan yes, that should work. The documentation states that the reader expects a SF Resource and links to SF docs, so any Resource implementation should work.

@spring-projects-issues spring-projects-issues added type: feature status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" in: infrastructure labels Dec 16, 2019
@spring-projects-issues spring-projects-issues added this to the 4.1.0 milestone Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has: backports Legacy label from JIRA. Superseded by "for: backport-to-x.x.x" in: infrastructure status: declined Features that we don't intend to implement or Bug reports that are invalid or missing enough details type: feature
Projects
None yet
Development

No branches or pull requests

1 participant