Skip to content

Read large values stress test - nested reference buffer bug #201 #165 #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

liutec
Copy link
Contributor

@liutec liutec commented Dec 4, 2017

This test should fail because a reference to the previous _view is kept in ChunkedInputBuffer.frame_message
Fix to follow.

@zhenlineo zhenlineo mentioned this pull request Dec 5, 2017
@zhenlineo
Copy link
Contributor

Hi @liutec,

Sorry for the late reply. I included your improvement to remove the recursion in #206 together with the fix for releasing the reference of view. Your fix to change to _data also works, but our initial plan was to use view to just expose part of the _data array, therefore we still prefer to use the close method to release the reference instead.

Also this part of the code also need to compile to C, therefore I started a PR to address some changes to Cython code.

Would you also try out the changes in #206 and see if that also works for you? I failed to make client throw OOM before server. Also I change a bit in your ChunkOutput#write code. I am super bad at managing buffers. It would be super if you could review and test out.

Finally if it is okay, would you mind sign our CLA so that we have the right to compile and merge your code to our codebase?

Thanks again for helping us on this issue!
Zhen

@liutec
Copy link
Contributor Author

liutec commented Dec 6, 2017

Hello @zhenlineo,

thank you, I tested the changes from #206 and the issue is gone.

Sorry about the mess, I'm not a big contributor on GitHub and I still have a lot to learn about the process (and Python). I appreciate your patience.

I signed the CLA as per the instructions.

As for ChunkedOutputBuffer#write because of while to_write > 0: and wrote = min(to_write, remaining) I think the condition remaining < to_write <= max_chunk_size may be true only on the first iteration and will cut short the existing chunk (I did not include this because I could not see any case this would be useful -- the old code cuts short the existing/last new chunk because it does if new_data_size > chunk_remaining: self.chunk() without filling chunk_remaining). Also, self.chunk() without remaining = max_chunk_size will waste one iteration (3 comparisons and 2 ops)

I just noticed the junk code I've committed:

data_size = self._end - self._start
if data_size > new_data_size:
    new_end = self._end + new_data_size
    self._data[self._end:new_end] = bytearray(data_size)

I was trying to pre-calculate the size for _data before the actual writes to avoid multiple extends (did not finish, hope to do so as time permits).

From what I saw here, bytearrays are stored in contiguous blocks of memory and even though Python tries to overallocate for small changes this still implies a huge overhead for the allocs & copy required for large data.

Resizing _data in one shot before the chunked write starts (recursive or not) should fix the memory issue by using one extend per write call instead of one per chunk required for the data --- in the worst case. The performance gain should also be significant because the old data will remain in place.

@zhenlineo
Copy link
Contributor

@liutec
I did not find your CLA, could you resent it and also cc me (zhen.li at neo4j.com) in the email?

You are most welcome to improve the fix. I am mostly trying to get in this fix first and send a patch out so that we could first clear people's problem.

Cheers,
Zhen

@liutec
Copy link
Contributor Author

liutec commented Dec 6, 2017

@zhenlineo
Sent it again, this time from my gmail account. Hope there will be no more problems.

@zhenlineo
Copy link
Contributor

@liutec Thanks very much. I got your CLA and merge the other PR with your commits.

You are most welcome to improve the code.
Thanks a again for helping us find and fix the issue.

Zhen

@zhenlineo
Copy link
Contributor

Hi @liutec,

This PR have some conflicts with the new 1.5 code as both you and me modified the same part of code. If you would like to continue working on this PR for improving buffer usage, then it might be easier to create a new PR on the top of the new 1.5 branch?

The PR #206 includes your first few commits, but I modified a bit based on your feedback. So thanks again for contribute to this driver! And if you issue new PRs, our integrated build server will recognize you as a old friend. So you are most welcome to contribute more if you like. I am closing this PR now as I am just tidying things up around before Christmas.

Cheers,
Zhen

@zhenlineo zhenlineo closed this Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants