Read large values stress test - nested reference buffer bug #201 #165 #204

liutec · 2017-12-04T14:03:14Z

This test should fail because a reference to the previous _view is kept in ChunkedInputBuffer.frame_message
Fix to follow.

…eo4j#165

zhenlineo · 2017-12-05T14:38:12Z

Hi @liutec,

Sorry for the late reply. I included your improvement to remove the recursion in #206 together with the fix for releasing the reference of view. Your fix to change to _data also works, but our initial plan was to use view to just expose part of the _data array, therefore we still prefer to use the close method to release the reference instead.

Also this part of the code also need to compile to C, therefore I started a PR to address some changes to Cython code.

Would you also try out the changes in #206 and see if that also works for you? I failed to make client throw OOM before server. Also I change a bit in your ChunkOutput#write code. I am super bad at managing buffers. It would be super if you could review and test out.

Finally if it is okay, would you mind sign our CLA so that we have the right to compile and merge your code to our codebase?

Thanks again for helping us on this issue!
Zhen

liutec · 2017-12-06T02:06:05Z

Hello @zhenlineo,

thank you, I tested the changes from #206 and the issue is gone.

Sorry about the mess, I'm not a big contributor on GitHub and I still have a lot to learn about the process (and Python). I appreciate your patience.

I signed the CLA as per the instructions.

As for ChunkedOutputBuffer#write because of while to_write > 0: and wrote = min(to_write, remaining) I think the condition remaining < to_write <= max_chunk_size may be true only on the first iteration and will cut short the existing chunk (I did not include this because I could not see any case this would be useful -- the old code cuts short the existing/last new chunk because it does if new_data_size > chunk_remaining: self.chunk() without filling chunk_remaining). Also, self.chunk() without remaining = max_chunk_size will waste one iteration (3 comparisons and 2 ops)

I just noticed the junk code I've committed:

data_size = self._end - self._start
if data_size > new_data_size:
    new_end = self._end + new_data_size
    self._data[self._end:new_end] = bytearray(data_size)

I was trying to pre-calculate the size for _data before the actual writes to avoid multiple extends (did not finish, hope to do so as time permits).

From what I saw here, bytearrays are stored in contiguous blocks of memory and even though Python tries to overallocate for small changes this still implies a huge overhead for the allocs & copy required for large data.

Resizing _data in one shot before the chunked write starts (recursive or not) should fix the memory issue by using one extend per write call instead of one per chunk required for the data --- in the worst case. The performance gain should also be significant because the old data will remain in place.

zhenlineo · 2017-12-06T11:01:08Z

@liutec
I did not find your CLA, could you resent it and also cc me (zhen.li at neo4j.com) in the email?

You are most welcome to improve the fix. I am mostly trying to get in this fix first and send a patch out so that we could first clear people's problem.

Cheers,
Zhen

liutec · 2017-12-06T11:23:11Z

@zhenlineo
Sent it again, this time from my gmail account. Hope there will be no more problems.

zhenlineo · 2017-12-06T12:26:58Z

@liutec Thanks very much. I got your CLA and merge the other PR with your commits.

You are most welcome to improve the code.
Thanks a again for helping us find and fix the issue.

Zhen

zhenlineo · 2017-12-14T09:17:03Z

Hi @liutec,

This PR have some conflicts with the new 1.5 code as both you and me modified the same part of code. If you would like to continue working on this PR for improving buffer usage, then it might be easier to create a new PR on the top of the new 1.5 branch?

The PR #206 includes your first few commits, but I modified a bit based on your feedback. So thanks again for contribute to this driver! And if you issue new PRs, our integrated build server will recognize you as a old friend. So you are most welcome to contribute more if you like. I am closing this PR now as I am just tidying things up around before Christmas.

Cheers,
Zhen

liutec added 3 commits December 4, 2017 16:01

Read large values stress test - nested reference buffer bug neo4j#201 n…

75e3cc3

…eo4j#165

Fixes neo4j#201 neo4j#165

f0ae3bf

Improved buffer memory allocation & performance

a1c931c

zhenlineo mentioned this pull request Dec 5, 2017

Buffer resize #206

Merged

zhenlineo closed this Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read large values stress test - nested reference buffer bug #201 #165 #204

Read large values stress test - nested reference buffer bug #201 #165 #204

Uh oh!

liutec commented Dec 4, 2017

Uh oh!

zhenlineo commented Dec 5, 2017

Uh oh!

liutec commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 6, 2017

Uh oh!

liutec commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 14, 2017

Uh oh!

Uh oh!

Read large values stress test - nested reference buffer bug #201 #165 #204

Read large values stress test - nested reference buffer bug #201 #165 #204

Uh oh!

Conversation

liutec commented Dec 4, 2017

Uh oh!

zhenlineo commented Dec 5, 2017

Uh oh!

liutec commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 6, 2017

Uh oh!

liutec commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 6, 2017

Uh oh!

zhenlineo commented Dec 14, 2017

Uh oh!

Uh oh!