-
Notifications
You must be signed in to change notification settings - Fork 2.8k
TransferManager.copy() blocks on getObjectMetadata request. #988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ideally getObjectMetadata operation is pretty quick as it is a HEAD request. Even other operations download and upload get the metadata synchronously. Is it causing performance issues for you? Can you provide me data for these performance bottlenecks? |
The behavior I'm seeing is for small transfers (think lots of individual HTML files) the metadata request and copy basically take the same amount of time. The queuing up of transfers then gets limited by how fast we can fetch metadata. Effectively, what I'm seeing as that we aren't taking full advantage of |
We use content length from object metadata to update TransferProgress and in CopyCallable. So we need to wait for the getObjectMetadata operation to finish before starting the copy in CopyCallable. If we use separate thread for it, the copy thread is still blocking. I don't think there is any big advantage for customers by changing the current behavior. If you like, send us a PR and we are happy to take a look. |
If size is known, the similar approach could be used for copy as well: #983 |
On a related note, I just noticed that this GetObjectMetadata request is made for the source bucket/key but does not include the source versionId (if present). So if you are copying a non-current version of an object, the latest object version's metadata is incorrectly used in the transfer progress calculations and also in determining whether or not to use multipart copy. I couldn't find another open issue for this, so I opened #1009 |
I am running into this now. My workflow is currently:
These are small objects, so the synchronous overhead is not ideal, especially to get data possibly many times that is already known. I understand wanting the transfer manager API to stay simple. I notice some of the methods have a callback interface to get metadata, and I wonder if something like that might make sense. |
Another issue with this call not being async is if the object doesn't exist the |
Hey @mark-vieira @NikolayAtSony @ejono @jakeab @billoneil the SDK team has reviewed the feature request list for V1, and since they're concentrating efforts on V2 new features they decided to not implement this one in V1. It's still being considered for the TransferManager refactor in V2, see the referenced issue above. I'll go ahead and close this one. Please feel free to comment on the V2 issue with your use case, and reach out if you have further questions. |
Calls to
TransferManager.copy()
aren't really truly async since the method blocks while fetching object metadata for the source object.aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java
Line 1794 in b49d732
Ideally, this request would also be done asynchronously, as creating a large number of copy operations is bottlenecked by how quickly we can fetch object metadata.
The text was updated successfully, but these errors were encountered: