-
-
Notifications
You must be signed in to change notification settings - Fork 732
S3 with writing #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 with writing #174
Conversation
More of the tests easily converted
Also cc @hussainsultan who seems has historically been interested in S3 work. |
buff = io.BytesIO() | ||
buffer_size = 1024 * 16 | ||
for chunk in iter(lambda: resp['Body'].read(buffer_size), | ||
b''): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no idea that this was valid Python. I wonder when I'll stop learning things about this language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting this error:
/home/ubuntu/anaconda3/lib/python3.5/site-packages/distributed/s3fs.py in <lambda>()
529 buff = io.BytesIO()
530 buffer_size = 1024 * 16
--> 531 for chunk in iter(lambda: resp['Body'].read(buffer_size),
532 b''):
533 buff.write(chunk)
NameError: free variable 'resp' referenced before assignment in enclosing scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would happen if we raised a ClientError
but didn't fall into the listed condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, then this might be exactly the interesting case we wanted to find
On 8MB chunks, I know that multi-part upload only excepts chunks > 5MB, but I haven't seen a reference to preferable download chunks (this is not typical). The stream downloads in 16kB pieces. |
The 8MB chunks comes from the threshold that boto3 uses to determine when to start doing multipart downloads. I may have misinterpreted that as being a chunksize. |
Happy to change to default chunksize, so long as it's something long compared to typical records and doesn't add too much overhead per chunk. Note that |
Whoa, those errors are very odd. cc @jreback
|
@martindurant are these relevant? In [12]: boto3.s3.transfer.S3_RETRYABLE_ERRORS
Out[12]:
(socket.timeout,
ConnectionError,
botocore.vendored.requests.packages.urllib3.exceptions.ReadTimeoutError,
botocore.exceptions.IncompleteReadError) |
I currently retry for all exceptions - could just be set to this list. |
prob related to this: pandas-dev/pandas#11915 we are using |
note that |
Closing. This has been partially merged and is otherwise continuing at https://github.com/dask/s3fs |
with retried on downloads