-
Notifications
You must be signed in to change notification settings - Fork 2.8k
TransferManager hitting "Connection Reset" #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
According to that stack trace you've made it to the point where you're reading the object content, which means S3 sent you a successful response; the connection is not getting reset because we've accidentally exceeded the service's 100-requests-per-connection limit. Something else is going wrong that's resetting the connection mid-download. Unfortunately it's hard to say what - connections can be reset by anyone on the network between you and S3 for many different reasons. :( If you can grab request ids for some of these failed requests (maybe by turning on Apache HttpClient header logging?), the S3 team can double-check what's happening on their end for these requests. If you can grab a packet capture from your end that would also be interesting (although it'll presumably be quite large if you're pulling a bunch of data from S3). If you can't uncover the root cause of the resets, retries are your only recourse. The TransferManager has retries built in, but it's explicitly not retrying on SocketException. Strange... I'll chase down whether it's safe for us to add retries on SocketException in a future release - from a quick glance it seems like we should be able to. |
Thanks. Unfortunately given our data volumes and the rarity of this exception we can't feasibly do that amount of logging. Retrying should work fine though. Thanks! |
I'm seeing this pretty consistently, as well. I'm using the AWS SDK Version: 1.9.39 EDIT:
|
Adding an option to do resumable |
@david-at-aws PR here. However, I looked for the test suite for the s3 and TransferManager stuff, but I'm not finding anything. Is there a suite that I can validate my PR against and update w/ a test for the resume functionality? What's the common contributor workflow for people validating their changes w/ their own projects? Publish an artifact to either local or my own nexus repo? |
We've got a test suite internally that we run changes through before merging them. We realize it'd be much better for you to be able to run the tests yourself before sending us the PR, and we're working on separating it from some Amazon-internal infrastructure it currently depends on so we can publish it on GitHub - unfortunately not quite there yet. |
Got it. Do you publish nightly snapshots that I could pull into a project? Also, an update on the original issue: Enabling tcp keepalive in the |
Shouldn't the |
I'm using version
|
I noticed that this happens in "lazy" environment when the stream isn't read right after the connection returns the response. For instance if you create an Iterator of |
I'm running into this issue as well. Any suggestions as to ideal configuration parameters appreciated. @acmcelwee |
I never found any ideal config to get things consistently working. We started snappy compressing tar archives for that data, so our downloads w/ TransferManager are all large files, rather than a "directory" of a large number of files. Things have worked out a lot better since we made the switch. |
@jtrunick try to profile how much time does it take between establishing the connection and the actual reading of the content... I bet the problem is here... Lazy collections would cause that |
Hello @rcoh, @acmcelwee, and @l15k4, sorry for the long delay in response. Are you guys still experiencing this issue regularly? |
Pinging @jtrunick as well |
I'm not actively using any of the code where I ran into the issue, so I don't have any new data points to add. |
No new data points either. |
I'm more than sure that this happens when people are processing data lazily, using iterators or streams which increases lifetime of particular socket connection. AWS doesn't like long-living s3 socket connections... It can be always fixed by increasing
|
Thank you @acmcelwee @rcoh and @l15k4. A fix has been made to edit: The update should be available in our next release. |
I ran into this for a 7G download. |
We're using the transfer manager to download files but periodically running into the following (full stack trace at bottom)
The documentation seems to indicate that this happens when a connection is reused too many times:
Obviously we can retry these requests, but it isn't ideal. Is it something we're doing wrong with the library that's causing this? Or is the library not managing connections properly?
The text was updated successfully, but these errors were encountered: