Skip to content

Download Larges File from S3 #1352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Cekurok opened this issue Oct 19, 2017 · 7 comments
Closed

Download Larges File from S3 #1352

Cekurok opened this issue Oct 19, 2017 · 7 comments
Labels
guidance Question that needs advice or information. response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.

Comments

@Cekurok
Copy link

Cekurok commented Oct 19, 2017

Hi,
I'm trying to upload a large file with code:

GetObjectRequest req = new GetObjectRequest(bucketName,key);
req.setGeneralProgressListener(new ProgressListener() {
@OverRide
public void progressChanged(ProgressEvent progressEvent) {
String transferrBytes = "Download bytes: " + progressEvent.getBytesTransferred();
System.out.println(transferrBytes);
}
});
Download down = tm.download(req, new File("pathName));
down.waitForCompletion();

And I get an error

Unable to store object contents to disk: Premature end of Content-Length delimited message body (expected: 2390753280; received: 1080029648
com.amazonaws.SdkClientException: Unable to store object contents to disk: Premature end of Content-Length delimited message body (expected: 2390753280; received: 1080029648
at com.amazonaws.services.s3.internal.ServiceUtils.downloadToFile(ServiceUtils.java:313)
at com.amazonaws.services.s3.transfer.DownloadCallable.retryableDownloadS3ObjectToFile(DownloadCallable.java:288)
at com.amazonaws.services.s3.transfer.DownloadCallable.call(DownloadCallable.java:135)
at com.amazonaws.services.s3.transfer.DownloadCallable.call(DownloadCallable.java:53)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 2390753280; received: 1080029648
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
at java.io.FilterInputStream.read(Unknown Source)
at com.amazonaws.services.s3.internal.ServiceUtils.downloadToFile(ServiceUtils.java:307)
Part of the file is overwritten on top of another

SDK -1.11.215
Java - 1.8

@dagnir
Copy link
Contributor

dagnir commented Oct 19, 2017

Hi, premature end of stream errors are usually attributed to problems in the network that can cause the connection to S3 to be closed unexpectedly before the entire object is completely downloaded. It looks like you're already using the TransferManager class which has better retry handling than the normal AmazonS3Client so unfortunately there isn't a better solution to this at the moment. We have a feature request to add better retry handling to the S3 client but that will probably happen in upcoming version new major verison only (https://github.com/aws/aws-sdk-java-v2).

Please see #856 for more details.

@Cekurok
Copy link
Author

Cekurok commented Oct 20, 2017

Hi, The problem is that AWS downloads without a byte shift, which is critical for large files.It overwrites the information in the file, and does not append it.

@dagnir
Copy link
Contributor

dagnir commented Oct 20, 2017

Sorry I'm not sure I follow. Why does the SDK need to shift the bytes over (I'm guessing you mean in the target file)? Are you trying to download the single large file over multiple calls to TransferManager#download?

@Cekurok
Copy link
Author

Cekurok commented Oct 23, 2017

I'm trying to download 1 large file more than 1.1G. The logic of the work as I understand it - if a large file, it is loaded in parts, to load 2 parts of the file there must be a shift in the file where it is downloaded, so that the data that has already downloaded does not overwrite, it does not happen, either with Download or with MultipleFileDownload. Both methods overwrite the data in the file, and do not download to the file.
getObjectRequest.getRange() - does not work correctly and calls null

@shorea
Copy link
Contributor

shorea commented Oct 30, 2017

@Cekurok It sounds like you are doing a single download with a range and expecting the TransferManager to shift to the file to the corresponding write position. This only happens when the TransferManager itself does a multipart download or a resume, any single download (even one with a range) will completely overwrite the provided file. Can you explain your use case a bit more?

@shorea
Copy link
Contributor

shorea commented Nov 6, 2017

Closing issue for now. Feel free to reopen if you have further questions or problems.

@shorea shorea closed this as completed Nov 6, 2017
@Cekurok
Copy link
Author

Cekurok commented Nov 9, 2017

Hi,I download different files from the same folder from 100 MB to 2 GB. There is a problem in downloading files larger than 1.1Gb. When the 2 part of the file starts to load, it will reboot, which was loaded before that. As a result of the download, the application gives an error in the size of the file, and the file does not work out completely. I tried different methods of loading. As I understood the source code, MultipleFileDownload is based on the usual Download, with the number of bytes checking getRange () to move the load into the file by the required number of bytes and not overwrite the downloaded information. getRange() returns to me null.

@srchase srchase added guidance Question that needs advice or information. needs-response and removed Question labels Jan 4, 2019
@debora-ito debora-ito added response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days. and removed needs-response labels Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information. response-requested Waiting on additional info or feedback. Will move to "closing-soon" in 5 days.
Projects
None yet
Development

No branches or pull requests

5 participants