Skip to content

Feature requests: iter_chunks([max_size]) #2900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LinusU opened this issue Nov 26, 2015 · 12 comments
Closed

Feature requests: iter_chunks([max_size]) #2900

LinusU opened this issue Nov 26, 2015 · 12 comments

Comments

@LinusU
Copy link

LinusU commented Nov 26, 2015

I would love to have a function that would iterate over the chunks that are received, as they are received on the socket. It would work like socket.recv() works in the python standard library.

This would be very good for an easy way to consume the stream as efficient as possible. It would be awesome if we could update __iter__ to use this function as well instead of using iter_content with a fixed length of 128.

This has been discussed to some extent in #844 but that issue got closed because of inactivity. I opened this to be a more focused issue. If we feel that this is a good approach I could hopefully help implement it as well.

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

This already works. Use iter_content(None), as discussed in the documentation.

@Lukasa Lukasa closed this as completed Nov 26, 2015
@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

That's awesome \o/

Is there any reason why __iter__ on Response doesn't use it?

Would it be possible to add it to the documentation somewhere other than under the discussion about chunked _up_loading? Maybe under Raw response content. I think that's why I didn't find it.

Thank you for the quick response!

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

I'd happily accept a pull request that adds a similar stanza to that portion of the docs. =)

As to why we didn't change __iter__, I should point out another subtlety of the way iter_content works. iter_content returns up to the amount passed to the generator, but will return less if it receives a smaller chunk. 128 was chosen a long time ago as a reasonable maximum chunk size in that context.

One way or another, you should usually set a maximum size there. Arguably we should set it away from 128, but for now I don't think it's unreasonable to leave it as it. We may change it in 3.0.0, though.

@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

Are you sure that iter_content returns up to? I've never seen it return anything other the exactly the chunk_size (expect for last chunk). I'm even specifically trying to observe just that behaviour but it always returned the entire buffer.

When I take a stack-trace during r.iter_content(None) is that it is on _safe_read in pythons builtin httplib.py, which calls read until the number of bytes have been received. That means that it doesn't do what I had hoped...

Speaking of, this has actually been sitting here for quite some time now. Shouldn't it have given me some chunks? It works with smaller chunk_size, but I wanted to use None to get the chunk as soon as any data is available:

screen shot 2015-11-26 at 14 58 39

@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

Here is the stack trace that I observed:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 657, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 326, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 278, in read
    data = self._fp.read()
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 596, in read
    s = self._safe_read(self.length)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 703, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 384, in read
    data = self._sock.recv(left)
KeyboardInterrupt

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

What version of requests are you using?

@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

2.8.1 with python 2.7.10

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

Then the problem is that the website you contacted is not actually doing chunked encoding. In this context, it will attempt to up to the maximum.

@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

Hmm, okay, maybe I was a bit unclear but I didn't mean this in relation to chunked encoding. Chunked encoding is at the application level, but I wanted it on a packet level.

With that I mean that as soon as the first packet of data has arrived to my computer, I want to process that chunk of data. If several packets arrive while I'm doing something else, I don't mind getting a larger chunk.

This is the default behaviour or the recv function on the socket in python. It would ensure processing of data in the most effective manor possible.

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

@LinusU Unfortunately, httplib (upon which requests builds) does not expose this functionality. It converts the socket into a buffered file-like object which has a blocking read method, rather than a socket-like recv method. You could in principle reach down into the socket below httplib, but in practice I think that will only rarely work because httplib itself uses the blocking read logic to get the headers, which means there may be information inside the httplib buffer you'd need to grab.

@LinusU
Copy link
Author

LinusU commented Nov 26, 2015

Hmm, that is too bad :(

Thank you for all your help though, stellar support 👍

@Lukasa
Copy link
Member

Lukasa commented Nov 26, 2015

My pleasure, I'm sorry we can't be more helpful here!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants