Feature requests: iter_chunks([max_size]) #2900

LinusU · 2015-11-26T13:18:16Z

I would love to have a function that would iterate over the chunks that are received, as they are received on the socket. It would work like socket.recv() works in the python standard library.

This would be very good for an easy way to consume the stream as efficient as possible. It would be awesome if we could update __iter__ to use this function as well instead of using iter_content with a fixed length of 128.

This has been discussed to some extent in #844 but that issue got closed because of inactivity. I opened this to be a more focused issue. If we feel that this is a good approach I could hopefully help implement it as well.

The text was updated successfully, but these errors were encountered:

Lukasa · 2015-11-26T13:20:34Z

This already works. Use iter_content(None), as discussed in the documentation.

LinusU · 2015-11-26T13:27:19Z

That's awesome \o/

Is there any reason why __iter__ on Response doesn't use it?

Would it be possible to add it to the documentation somewhere other than under the discussion about chunked _up_loading? Maybe under Raw response content. I think that's why I didn't find it.

Thank you for the quick response!

Lukasa · 2015-11-26T13:47:33Z

I'd happily accept a pull request that adds a similar stanza to that portion of the docs. =)

As to why we didn't change __iter__, I should point out another subtlety of the way iter_content works. iter_content returns up to the amount passed to the generator, but will return less if it receives a smaller chunk. 128 was chosen a long time ago as a reasonable maximum chunk size in that context.

One way or another, you should usually set a maximum size there. Arguably we should set it away from 128, but for now I don't think it's unreasonable to leave it as it. We may change it in 3.0.0, though.

LinusU · 2015-11-26T13:58:55Z

Are you sure that iter_content returns up to? I've never seen it return anything other the exactly the chunk_size (expect for last chunk). I'm even specifically trying to observe just that behaviour but it always returned the entire buffer.

When I take a stack-trace during r.iter_content(None) is that it is on _safe_read in pythons builtin httplib.py, which calls read until the number of bytes have been received. That means that it doesn't do what I had hoped...

Speaking of, this has actually been sitting here for quite some time now. Shouldn't it have given me some chunks? It works with smaller chunk_size, but I wanted to use None to get the chunk as soon as any data is available:

LinusU · 2015-11-26T14:04:35Z

Here is the stack trace that I observed:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 657, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 326, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 278, in read
    data = self._fp.read()
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 596, in read
    s = self._safe_read(self.length)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 703, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 384, in read
    data = self._sock.recv(left)
KeyboardInterrupt

Lukasa · 2015-11-26T14:25:00Z

What version of requests are you using?

LinusU · 2015-11-26T14:41:27Z

2.8.1 with python 2.7.10

Lukasa · 2015-11-26T14:42:17Z

Then the problem is that the website you contacted is not actually doing chunked encoding. In this context, it will attempt to up to the maximum.

LinusU · 2015-11-26T14:45:47Z

Hmm, okay, maybe I was a bit unclear but I didn't mean this in relation to chunked encoding. Chunked encoding is at the application level, but I wanted it on a packet level.

With that I mean that as soon as the first packet of data has arrived to my computer, I want to process that chunk of data. If several packets arrive while I'm doing something else, I don't mind getting a larger chunk.

This is the default behaviour or the recv function on the socket in python. It would ensure processing of data in the most effective manor possible.

Lukasa · 2015-11-26T14:54:55Z

@LinusU Unfortunately, httplib (upon which requests builds) does not expose this functionality. It converts the socket into a buffered file-like object which has a blocking read method, rather than a socket-like recv method. You could in principle reach down into the socket below httplib, but in practice I think that will only rarely work because httplib itself uses the blocking read logic to get the headers, which means there may be information inside the httplib buffer you'd need to grab.

LinusU · 2015-11-26T15:03:18Z

Hmm, that is too bad :(

Thank you for all your help though, stellar support 👍

Lukasa · 2015-11-26T15:03:56Z

My pleasure, I'm sorry we can't be more helpful here!

Lukasa closed this as completed Nov 26, 2015

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature requests: iter_chunks([max_size]) #2900

Feature requests: iter_chunks([max_size]) #2900

LinusU commented Nov 26, 2015

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

Uh oh!

Feature requests: iter_chunks([max_size]) #2900

Feature requests: iter_chunks([max_size]) #2900

Comments

LinusU commented Nov 26, 2015

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!

LinusU commented Nov 26, 2015

Uh oh!

Lukasa commented Nov 26, 2015

Uh oh!