Skip to content

chunk_size for iter_lines() versus iter_content() #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

chunk_size for iter_lines() versus iter_content() #597

wants to merge 1 commit into from

Conversation

mponton
Copy link

@mponton mponton commented May 8, 2012

Seriously, I think you should merge this one. The latest release and the develop branch still have the default chunk_size "reversed" for iter_content() and iter_lines() contrarilly to what you mentioned here: https://github.com/kennethreitz/requests/pull/589#issuecomment-5567553

This pull request may be canned by Travis CI again, but as I mentioned in the above pull request, the issue is located either in the test logic or the way lines ending in \r\n in iter_lines() depending how you interpret the issue... (I think the later is the real problem as even without changing chunk_size, iter_lines() could return different results for the same input depending on where the line-ending ends inside the buffer read by iter_content()). Not knowing how you perceive the issue, I did not attempt to fix it.

----- Original idea:

Following discussion about issue #539 Kenneth agreed that a default
chunk_size of 1 for iter_lines() would make more sense, but
apparently changed chunk_size for iter_content() by mistake
instead, which did solve the iter_lines() issue but also changed the
default iter_content() chunk to 1 bytes which is not desirable.

Following discussion about issue #539 Kenneth agreed that a default
`chunk_size` of 1 for `iter_lines()` would make more sense, but
apparently changed `chunk_size` for `iter_content()` by mistake
instead, which did solve the `iter_lines()` issue but also changed the
default `iter_content()` chunk to 1 bytes which is not desirable.
@travisbot
Copy link

This pull request fails (merged 87f0a92 into 0308a28).

@mponton
Copy link
Author

mponton commented May 8, 2012

As expected. See comments above and for my previous pull request here: https://github.com/kennethreitz/requests/pull/589#issuecomment-5538019

@berg
Copy link
Contributor

berg commented Jul 2, 2012

I agree with @mponton — the current default chunk_size value of 1 for iter_content() seems broken. We were bitten by this performance regression when upgrading to 0.13.2.

@rboulton
Copy link
Contributor

Did this get merged, or ignored? Would have been useful to have had a comment made when the pull request was closed.

Looks like it was ignored, which ... means I had a massive performance regression just now.

@kennethreitz
Copy link
Contributor

Duly noted but 'ignored' for now. Definitely needs to change, but we need to find a good answer to the issue noted above first.

@slingamn
Copy link
Contributor

I don't entirely know what I'm talking about, but here are some guesses about how iter_content should maybe work:

  1. The core use case for this feature seems to be streaming APIs like Twitter's "firehose", which use HTTP 1.1 and chunked encoding
  2. As per Twitter's guidelines [https://dev.twitter.com/docs/streaming-apis/processing], the right way to consume an API like this is to use chunked encoding to tell how much data to read at a time. They suggest deferring the details of this to the underlying HTTP library.
  3. Unfortunately, it looks like we can't reuse the chunked encoding implementation from httplib, because it's hard-wired to consume the entire document at once and return it. We might need to make our own --- possibly by copying and pasting code from httplib?
  4. For HTTP 1.0 and non-chunked pages, we should probably just try to read the entire document and yield it as a single string?

As mentioned at #775, I am also interested in changing the iter_content behavior so that if the body has already been consumed, it yields a one-item iterator containing the whole body, e.g., iter([self.body]).

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants