-
Notifications
You must be signed in to change notification settings - Fork 2.8k
TransferManager multipart upload from a FileInputStream instance fails with ResetException #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @lolski, The reason of the failure has something to do with the default buffer limit in a val in = new FileInputStream(file) and pass it to the request instead of the buffered input stream. The S3 Java Client is able to handle a However, in this case we recommend to use a simpler approach: you can directly specify the original file in the For completeness, suppose you had an input stream that is not associated with a file, it would still be NOT necessary to wrap it with a
Hope this makes sense. Regards, |
Hi @hansonchar, I have confirmed that on SDK version 1.9.33, the exact same error happens even when I use a However, specifying a Here's the code excerpt:
|
Hi @lolski, If you looked at the release note of 1.9.34, you will see there is a bug fix exactly on this related to (But, of course, specifying a file is the recommended approach.) Regards, |
On a side note, suppose you have a (non-file) input stream with a max expected size of 100,000 bytes, the read limit to set would need to be 1 extra byte more i.e. 100,001 so that the mark and reset will always work for 100,000 bytes or less. Regards, |
@hansonchar Does that rule apply to file input stream too? I think it's good to add this info to the docs: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/RequestClientOptions.html#setReadLimit(int) |
Nop. It should just work if you specified a Agree on the javadoc. |
@hansonchar then I want to confirm that the problem still persist on You should be able to reproduce it by allocating a large file e.g. Also, sometimes the upload will succeed especially when the file is not that large, e.g. 1GB or 2GB. I've had 1 success out of trying to upload 2GB file after 4 tries. This might mean that the retry part of the code is the cause. |
Hi @lolski, This is because the FileInputStream got wrapped by TransferManager into a different type of stream for multi-part uploads before passing it to the low-level S3 client, and therefore the stream got treated as if it needed memory buffering. I think the fixes should be rather straightforward. Will look into this. |
I just tested a fix and got a 10G file uploaded using TransferManager with FileInputStream as the input in the request. Will include the fix in the next release. |
@hansonchar thanks |
@hansonchar Do you know which version of the SDK you have released the fix in? I am still seeing this error and I am using AWS Java SDK version 1.10.20. |
@hansonchar I'm also seeing this with AWS Java SDK 1.10.15. Particularly for large files > 60GB using just a normal InputStream, the transfer manager seems to be wrapping the stream into a mark supported stream which eventually fails with the same error. |
@hansonchar I agree with the above posts about the error still recurring. I am fortunate that it's feasible for me to simply use the file-based method instead |
I'm having success setting the multi-part size to the buffer size (this way the part can always be reset in case of connection failure): val uploader = new TransferManager(...)
val request = new PutObjectRequest(...)
// set the buffer size (ReadLimit) equal to the multipart upload size, allowing us to resend data if the connection breaks
request.getRequestClientOptions.setReadLimit(TEN_MB)
uploader.getConfiguration.setMultipartUploadThreshold(TEN_MB)
val upload = uploader.upload(request) |
This should fix sporadic uploading errors, and errors that occurred when uploading huge files (aws/aws-sdk-java#427). Funnily enough, we were always using tempfiles for uploads anyway. [finishes #109975976]
We managed to get around the problem through implementing our own mark and resettable stream by wrapping a FileChannel:
We passed this stream to the TransferManager |
I'm having this issue using
Not using the |
@rdifalco Did you try @garretthall's suggestion? I have the same use case as you and it's working perfectly. One small adjustment: I believe read limit needs to be set to the part size, not the multipart threshold. You can get this value via TransferManagerUtils.calculateOptimalPartSize, which should be the maximum number of bytes that'll be buffered for a given upload. |
@spieden are you suggesting the following?
And then to set the |
Now I'm starting to question the value of this. Is it better to have optimal part sizes instead of part sizes I feel comfortable having completely buffered? If there is a reset error then I just retry the entire operation myself instead of relying solely on the AWS SDK to retry it for me. What do you think @hansonchar? |
What's the official/unofficial fix / implementation approach to avoid the @kiiadi : What do you or your colleagues recommend? |
We've been running into this issue even after putting @garretthall 's fix in place. Any ideas? |
Just a quick summary of the issue and best practices:
|
I am not sure what will be ideal value for resetLimit(). The data files I like to upload to S3 is in the range of 8GB to 15GB. I have set initial partSize as 5GB. In this case, what will be ideal value for resetLimit()? For now, I have set 10MB as the readLimit. I like to get recommendations on what is the value that is ideal for my use case. |
If you are using stream to upload an object, SDK will do a single upload and can't upload in parts. So in that case, the read limit would be object size (8 - 15gb in your case) + 1. If no content length is specified, the http client might buffer entire stream into memory. So it is recommended to provide content length when uploading via stream. |
Please note that when uploading from a stream the readLimit will result in buffering that much data into memory so it's recommended to set that conservatively. Uploading from a file is a more reliable and performant option as we can know the content length from the length of the file and reproduce the content as much as needed for retries. |
Hi, I've faced with almost the same problem when use S3ObjectInputStream. https://stackoverflow.com/questions/46360321/unable-to-reset-stream-after-calculating-aws4-signature |
I don't see how this is an acceptable workaround. There are many reasons why I wouldn't want to write data to a temporary file (disk usage, file permissions, security concerns), and obviously not all data has a known-in-advance size or fits into memory, so having to permit TransferManager to buffer the whole thing in-memory is also inadequate. Why doesn't TransferManager simply buffer the batch size of data that it sends? Then retrying a part upload is trivial. |
I'v investigated this issue, it was a long story. The conclusion is: pass a system property to java by insert following options to java command line
This tells AmazonS3Client to set max appropriate unwindable buffer size. Edit 20181102: the link should be setReadLimit aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java Line 1668 in 856d27b
|
Thanks for posting the explanation and the link! However your link is now incorrect. I believe the correct canonical link is: aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java Line 1668 in 856d27b
|
@thauk-copperleaf thank you, you are right. |
I just had this problem streaming content larger than 5GB from a ftp server to S3. I tried most of all what people wrote here and linked to other issues or sites. None of them worked, but I finally got it working. It looks like AWS need some time after upload is done, to do their stuff, and doing that, the connection is getting closed. I found a setting that make it work:
Maybe an answer to the question @stevematyas asked #427 (comment) |
Multipart upload of a
FileInputStream
using the following code will fail withResetException: Failed to reset the request input stream
. I also tried, with no luck, to wrap theFileInputStream
in a BufferedReader, which supports marking (confirmed by checking thatBufferedInputStream.markSupported()
indeed returnstrue
).Here is the stack trace:
The text was updated successfully, but these errors were encountered: