-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Executor factory on transfer manager vs max connection on s3 client? #1696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The max connections is how many TCP/HTTP connections should be allowed by the client. The executor thread pool size is the number of threads used to perform file uploads/downloads/etc. Because transfer manager currently uses one thread per upload/download, it usually makes sense for the number of threads and connections to be the same, assuming you're not using the S3 client for anything else. If you're not using the S3 client instance directly or in another transfer manager and the number of threads is higher than the number of connections, a few may end up waiting for a connection to free up. If the number of threads is lower than the number of connections, you'll never utilize every connection available to the client. The threads may sometimes be used for some extra work that doesn't use a connection (eg. combining the parts of a downloaded file), so if you looked really closely you may notice that a thread can be running without using a connection. TL;DR: If you're not using the client for anything else, you can set the max connections = the thread pool size to be sure you can have optimum throughput/latency. If you're using the client for something else, you'll need to do some more thinking about the different use-cases and how contention over the shared pool of connections could affect your throughput/latency. |
Thanks Matthew for the explanation! As you have mentioned, |
I wouldn't worry it. The risk of blocking on waiting for a connection is worse for throughput than the risk of not always utilizing every available connection. |
I'm a bit confused about the advice to use a bounded thread pool here because that seems to contradict certain warnings in the docs. Upthread, @millems wrote
However, in the docs for a deprecated
However, this warning is missing from the newer #939 has additional discussion of the deadlock issue and implies that we "Have to choose between deadlocks, unbounded executors, or internal APIs". There's additional discussion of this issue on HADOOP-13826. Given this, I wanted to ask here to clarify @millems's remarks above and figure out whether there's a more precise set of circumstances where it is guaranteed to be safe to use a bounded thread pool. |
Hey there,
I'm trying to understand two similar configs when doing the s3 download performance tuning:
My current code looks like:
ClientConfiguration conf = new ClientConfiguration()
.withMaxConnections(THREAD_POOL_SIZE).withSocketTimeout(3600_000);
I have a question on what's the relation between transfer manager executor factory and s3client max connections? If I set them to different values, will each thread open more TCP connections overall?
Thanks for your time and let me know if my question makes sense.
The text was updated successfully, but these errors were encountered: