Skip to content

Allow Support for Uploading Byte Arrays and Strings in S3 TransferManager #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pradyuman opened this issue Dec 27, 2016 · 8 comments
Closed
Assignees
Labels
feature-request A feature should be added or improved.

Comments

@pradyuman
Copy link

Currently, the only way to upload data to S3 via TransferManager is through an InputStream or a File. I'm creating a service needs to upload data that is in memory. This means I need to write that Array[Byte] to a temp file, and then upload that temp file which introduces other variables that need to be tuned for performance (now need to make sure the temporary file system is optimized for our use case). It would be great if I could just pass in the byte array (can also convert to string if that's easier). This would mean I could keep everything in memory and would keep my performance benchmarking and production environment tuning simpler.

I'm sure this is a use case many other people may run into (or have already run into) so I think it's definitely worth taking a look at.

@pradyuman
Copy link
Author

To give some context, the application ingests information using Kafka/Spark Streaming so I operate on an RDD to get my byte arrays. I then want to upload those byte arrays to S3. The application would operate completely in memory if I didn't need to create a tempfile (which also means the application would be more portable because I only have to worry about memory on my production environment).

pradyuman referenced this issue in legends-ai/totsuki Dec 27, 2016
…tribution and fix the totsuki implementation to actually work (with better performance)
@dagnir
Copy link
Contributor

dagnir commented Dec 27, 2016

Hi @pradyuman, would wrapping your byte array in a ByteArrayInputStream and using PutObjectRequest#withInputStream work for you? Note that using an InputStream means the TransferManager won't be able to to parallelize the upload but hopefully these arrays are relatively small since they're kept in memory.

@dagnir
Copy link
Contributor

dagnir commented Dec 27, 2016

The SDK also has StringIputStream if you want to convert the arrays to strings.

@pradyuman
Copy link
Author

Those are both good options, but the arrays are 100-200MB in size so I imagine that I would benefit from the multipart optimizations.

@kiiadi kiiadi added the feature-request A feature should be added or improved. label Mar 9, 2017
@mzapletal
Copy link

I would be happy for an in-memory version of TransferManager as well - we have the requirement of downloading multiple (small) files concurrently which we keep in memory. However, it would be nice to use the TransferManager to benefit from its thread management.

@rmilejcz
Copy link

I would love to see this implemented, has there been any progress?

@shorea
Copy link
Contributor

shorea commented Sep 13, 2017

No progress has been made. We are considering supporting InputStreams in TransferManager (which would allow for other types like strings and byte arrays easily) but that's quite a ways off. aws/aws-sdk-java-v2#139

For now we recommend using an overload of put or get object (one takes/returns a string) and submitting to an executor if you need parallelization.

@debora-ito
Copy link
Member

Hey @pradyuman @mzapletal @rmilejcz,

the SDK team has reviewed the feature request list for V1, and since they're concentrating efforts on V2 new features they decided to not implement this one in V1. It's still being considered for the TransferManager refactor in V2, see the referenced issue above. I'll go ahead and close this one.

Please feel free to comment on the V2 issue with your use case, and reach out if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

8 participants