Skip to content

Refactor S3TransferManager to support non-file-based transfers #2817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 8, 2021

Conversation

Bennett-Lynch
Copy link
Contributor

@Bennett-Lynch Bennett-Lynch commented Nov 3, 2021

Revision 2 (ffc3deb, 4aac24a)

  1. Update change log to indicate breaking changes
  2. Update README to reflect new API and also to include examples for TransferListener
  3. Misc. small cleanups from feedback
  4. Additional test coverage

Revision 1 (995ca7d)

Motivation and Context

We would like for S3TransferManager to support non-file-based uploads and downloads, such as reactive streams or in-memory byte buffers, but the current implementation is directly coupled to file/Path-based transfers. While the underlying S3AsyncClient already has support for AsyncRequestBodies and AsyncResponseTransformers, these constructs are not exposed through the current S3TransferManager interface.

Additionally, the existing S3TransferManager interfaces have made some assumptions that make it difficult to introduce this support, such as UploadRequest/DownloadRequest having a Path attribute, and not having interfaces that allow us to distinguish between transfer types, e.g., single-object and multi-object (directory) transfers. Likewise, the current interface hierarchy also makes it difficult to distinguish between file-based and non-file-based. Therefore, this refactoring requires backwards-incompatible changes, which are permissible while S3TransferManager is still in PREVIEW.

Description

The diff here is very large, but this change set primarily introduces the following interface hierarchy:


  • TransferRequest
    • TransferObjectRequest
      • UploadFileRequest
      • DownloadFileRequest
      • UploadRequest
      • DownloadRequest<T>
    • TransferDirectoryRequest
      • UploadDirectoryRequest
  • Transfer
    • ObjectTransfer
      • FileUpload
      • FileDownload
      • Upload
      • Download<T>
    • DirectoryTransfer
      • DirectoryUpload
  • CompletedTransfer
    • CompletedObjectTransfer
      • CompletedFileUpload
      • CompletedFileDownload
      • CompletedUpload
      • CompletedDownload<T>
    • CompletedDirectoryTransfer
      • CompletedDirectoryUpload
  • FailedObjectTransfer
    • FailedFileUpload

These interfaces allow us to more selectively choose which attributes will be shared between different data types. Additionally, we take this opportunity to make the naming convention and class-vs-interface distinction consistent across all the above data types.

It is important that we distinguish between file-based-requests and non-file-based-requests so that the Collection<> of failed file transfers from a directory upload can be re-driven, if needed.

Note that the above Download-oriented data types are all generic. This is because of how AsyncResponseTransformer is designed to be generic, being parameterized with the type of data that a given AsyncResponseTransformer produces. Therefore, a user must declare the type at the time of instantiating a DownloadRequest, and the type will
be consistent throughout the transfer lifecycle, including the Download and CompletedDownload interfaces.

For a quick overview of the new API, refer to the upload/download integration tests:

Arbitrary object transfers

Upload upload =
tm.upload(UploadRequest.builder()
.putObjectRequest(b -> b.bucket(TEST_BUCKET).key(TEST_KEY))
.requestBody(AsyncRequestBody.fromString(content))
.overrideConfiguration(b -> b.addListener(LoggingTransferListener.create()))
.build());
CompletedUpload completedUpload = upload.completionFuture().join();

Download<ResponseBytes<GetObjectResponse>> download =
tm.download(DownloadRequest.builder()
.getObjectRequest(b -> b.bucket(BUCKET).key(KEY))
.responseTransformer(AsyncResponseTransformer.toBytes())
.overrideConfiguration(b -> b.addListener(LoggingTransferListener.create()))
.build());
CompletedDownload<ResponseBytes<GetObjectResponse>> completedDownload = download.completionFuture().join();
ResponseBytes<GetObjectResponse> result = completedDownload.result();

File-based transfers

FileUpload fileUpload =
tm.uploadFile(UploadFileRequest.builder()
.putObjectRequest(b -> b.bucket(TEST_BUCKET).key(TEST_KEY))
.source(testFile.toPath())
.overrideConfiguration(b -> b.addListener(LoggingTransferListener.create()))
.build());
CompletedFileUpload completedFileUpload = fileUpload.completionFuture().join();

FileDownload download =
tm.downloadFile(DownloadFileRequest.builder()
.getObjectRequest(b -> b.bucket(BUCKET).key(KEY))
.destination(path)
.overrideConfiguration(b -> b.addListener(LoggingTransferListener.create()))
.build());
CompletedFileDownload completedFileDownload = download.completionFuture().join();

License

  • I confirm that this pull request can be released under the Apache 2 license

## Motivation

We would like for S3TransferManager to support non-file-based uploads
and downloads, such as reactive streams or in-memory byte buffers, but
the current implementation is directly coupled to file/Path-based
transfers. While the underlying S3AsyncClient already has support for
AsyncRequestBodies and AsyncResponseTransformers, these constructs are
not exposed through the current S3TransferManager interface.

Additionally, the existing S3TransferManager interfaces have made some
assumptions that make it difficult to introduce this support, such as
UploadRequest/DownloadRequest having a Path attribute, and not having
interfaces that allow us to distinguish between transfer types, e.g.,
single-object and multi-object (directory) transfers. Likewise, the
current interface hierarchy also makes it difficult to distinguish
between file-based and non-file-based. Therefore, this refactoring
requires backwards-incompatible changes, which are permissible while
S3TransferManager is still in PREVIEW.

## Modifications

The diff here is very large, but this change set primarily introduces
the following interface hierarchy:

* TransferRequest
    * TransferObjectRequest
        * UploadFileRequest
        * DownloadFileRequest
        * UploadRequest
        * DownloadRequest<T>
    * TransferDirectoryRequest
        * UploadDirectoryRequest
* Transfer
    * ObjectTransfer
        * FileUpload
        * FileDownload
        * Upload
        * Download<T>
    * DirectoryTransfer
        * DirectoryUpload
* CompletedTransfer
    * CompletedObjectTransfer
        * CompletedFileUpload
        * CompletedFileDownload
        * CompletedUpload
        * CompletedDownload<T>
    * CompletedDirectoryTransfer
        * CompletedDirectoryUpload
* FailedObjectTransfer
    * FailedFileUpload

These interfaces allow us to more selectively choose which attributes
will be shared between different data types. Additionally, we take this
opportunity to make the naming convention and class-vs-interface
distinction consistent across all the above data types.

It is important that we distinguish between file-based-requests and
non-file-based-requests so that the Collection<> of failed file
transfers from a directory upload can be re-driven, if needed.

Note that the above Download-oriented data types are all generic. This
is because of how AsyncResponseTransformer is designed to be generic,
being parameterized with the type of data that a given
AsyncResponseTransformer produces. Therefore, a user must declare the
type at the time of instantiating a DownloadRequest, and the type will
be consistent throughout the transfer lifecycle, including the Download
and CompletedDownload interfaces.
@Bennett-Lynch Bennett-Lynch added breaking-change Issue requires a breaking change to remediate. transfer-manager labels Nov 3, 2021
@Bennett-Lynch Bennett-Lynch requested a review from millems November 3, 2021 04:27
@Bennett-Lynch
Copy link
Contributor Author

/cc @pkgonan & #2714. Please let me know if this API satisfies your use case. You should be able to adapt a Flux<ByteBuffer> to this API by doing something like: AsyncRequestBody.fromPublisher(Flux<ByteBuffer>).

@pkgonan
Copy link

pkgonan commented Nov 3, 2021

@Bennett-Lynch
Thank you so much. This is exactly the feature I've been waiting for. thank you.

Copy link
Contributor

@zoewangg zoewangg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the sample code in the README? cc @Carey-AWS, can we update dev guide?

@Bennett-Lynch
Copy link
Contributor Author

Bennett-Lynch commented Nov 4, 2021

Can we update the sample code in the README?

@zoewangg I updated the README. I wasn't sure if we should include examples for the arbitrary upload/download, or just file transfers (to keep it simple). Please take a look at let me know what you prefer.

Copy link
Contributor

@zoewangg zoewangg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the README, LGTM. I think we should have sample code for common use-cases like what you added. The DDB enhanced client README is a good example, imo and we've got positive feedback on it. https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/dynamodb-enhanced

@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 8, 2021

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 30 Code Smells

72.3% 72.3% Coverage
6.7% 6.7% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Issue requires a breaking change to remediate. transfer-manager
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants