Skip to content

Input buffer capacity exceeded when reading large values #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
liutec opened this issue Nov 29, 2017 · 9 comments
Closed

Input buffer capacity exceeded when reading large values #201

liutec opened this issue Nov 29, 2017 · 9 comments

Comments

@liutec
Copy link
Contributor

liutec commented Nov 29, 2017

When reading results with values larger than the ChunkedInputBuffer's default capacity, a non-descriptive error gets triggered.

It would be nice to have the buffer capacities configurable along with a specific error message.

I'm currently using this as a workaround:

class LargeChunkedInputBuffer(ChunkedInputBuffer):
    def __init__(self, capacity=4194304):
        super().__init__(capacity)

neo4j.bolt.connection.ChunkedInputBuffer = LargeChunkedInputBuffer

I realize it's not the best idea to store large values in Neo4j but as long as it's not prohibited they should be readable.

liutec@165946f

@zhenlineo
Copy link
Contributor

HI @liutec,

Thanks for bring this issue to us. It should not be needed to enlarge the capacity manually as the buffer should automatically grow for big records. Could you share us what error did you got?

Cheers,
Zhen

@liutec
Copy link
Contributor Author

liutec commented Dec 2, 2017

Hello,

sorry for the late reply as well as for not including all the steps to reproduce this issue.
I was under the impression that the buffer has a fixed maximum size to avoid re-allocations. If the buffer was meant to grow beyond the capacity then it's clearly a problem only on Python 3 https://bugs.python.org/issue29178

neo4j server v3.0.5
Python 3.5.2
neo4j-driver >= 1.4.0 (did not test on older versions)

import sys
import neo4j
from neo4j.v1 import GraphDatabase


print(sys.version_info)
print("neo4j-driver", neo4j.__version__)

neo4j_driver = GraphDatabase.driver(
    uri="bolt://xxx.xxx.xxx.xxx:7687",
    auth=("neo4j", "neo4j")
)

cypher = "MERGE (n:TestNode) SET n.data = $data"
params = {
    "data": "x" * 524111  # does not reproduce below 524111
}

with neo4j_driver.session() as tx:
    tx.run(cypher, params)
del params

cypher = "MATCH (n:TestNode) RETURN n.data AS data LIMIT 1"
with neo4j_driver.session() as tx:
    records = tx.run(cypher)
    for record in records:
        print(len(record['data']))
sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
neo4j-driver 1.4.0
Exception ignored in: 'neo4j.bolt._io.ChunkedInputBuffer.receive'
BufferError: Existing exports of data: object cannot be re-sized
Traceback (most recent call last):
  File "neo4j-bug.py", line 26, in <module>
    for record in records:
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 718, in records
    self._session.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 356, in fetch
    detail_count, _ = self._connection.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 264, in fetch
    self._receive()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 302, in _receive
    raise self.Error("Failed to read from defunct connection {!r}".format(self.server.address))
neo4j.exceptions.ServiceUnavailable: Failed to read from defunct connection Address(host='xxx.xxx.xxx.xxx', port=7687)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "neo4j-bug.py", line 27, in <module>
    print(len(record['data']))
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 263, in __exit__
    self.close()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 296, in close
    self._disconnect(sync=True)
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 278, in _disconnect
    self._connection.sync()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 337, in sync
    detail_delta, summary_delta = self.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 258, in fetch
    raise self.Error("Failed to read from closed connection {!r}".format(self.server.address))
neo4j.exceptions.ServiceUnavailable: Failed to read from closed connection Address(host='xxx.xxx.xxx.xxx', port=7687)
sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
neo4j-driver 1.5.2
Exception ignored in: 'neo4j.bolt._io.ChunkedInputBuffer.receive'
BufferError: Existing exports of data: object cannot be re-sized
Traceback (most recent call last):
  File "neo4j-bug.py", line 26, in <module>
    for record in records:
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 718, in records
    self._session.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 356, in fetch
    detail_count, _ = self._connection.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 283, in fetch
    return self._fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 300, in _fetch
    self._receive()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 338, in _receive
    raise self.Error("Failed to read from defunct connection {!r}".format(self.server.address))
neo4j.exceptions.ServiceUnavailable: Failed to read from defunct connection Address(host='xxx.xxx.xxx.xxx', port=7687)

@liutec
Copy link
Contributor Author

liutec commented Dec 2, 2017

@liutec
Copy link
Contributor Author

liutec commented Dec 2, 2017

This should fix the issue, but for performance reasons I would still prefer to be able to set the buffer capacity:

    def receive(self, socket, n):
        """

        Note: may modify buffer size, should error if frame exists
        """
        new_extent = self._extent + n
        overflow = new_extent - len(self._data)
        if overflow > 0:
            if self._recycle():
                return self.receive(socket, n)
            self._view = None
            new_data = bytearray(new_extent)
            new_data[:self._extent] = self._data
            self._data = new_data
            self._view = memoryview(self._data)
        data_size = socket.recv_into(self._view[self._extent:new_extent])
        new_extent = self._extent + data_size
        self._extent = new_extent
        return data_size

@zhenlineo
Copy link
Contributor

Hi @liutec

Thanks for your rich info.

I got a possible idea to fix the issue you have. Would you be able to try the fix out in your code?

The changes are described as follows:

    def close(self):
         self._view = None
    def discard_message(self):
        if self._frame is not None:
            self._origin = self._limit
            self._limit = -1
            self._frame.close() # close the frame to release the reference
            self._frame = None

Please let us know if the suggested change would fix your issue!

Thanks again,
Zhen

@liutec
Copy link
Contributor Author

liutec commented Dec 3, 2017

Hello,

both changes fix the issue up to 6MB (not hitting TransientError: dbms.memory.heap.max_size).
I never needed more than 4MB to start with, but now, while looking at your proposed fix and while testing both changes, I got a closer look at the io.py file trying to understand the ~6MB limit because both the 2.7 and 3.5 32bit interpreter seem to exhaust all addressable memory.

After adding your changes, the problems became apparent:

  1. recursive ChunkedOutputBuffer.write causes MemoryError
    fixed with:
    def write(self, b):
        new_data_start = 0
        new_data_size = len(b)
        while new_data_start < new_data_size:
            chunk_occupied = self._end - self._start
            chunk_remaining = self._max_chunk_size - chunk_occupied
            if chunk_remaining == 0:
                self.chunk()
                chunk_remaining = self._max_chunk_size
            chunk_write_size = min(chunk_remaining, new_data_size - new_data_start)
            new_end = self._end + chunk_write_size
            new_chunk_size = new_end - self._start
            self._data[self._end:new_end] = b[new_data_start:(new_data_start + chunk_write_size)]
            new_data_start += chunk_write_size
            self._end = new_end
            self._data[self._header:(self._header + 2)] = struct_pack(">H", new_chunk_size)

this seems to fix the problem for Python 2.7 as well as it drastically lowers the memory consumption but quite often the GC seems to have a hard time with the 8192 increment in ChunkedInputBuffer.receive which leads to excessive delays and/or hangs.

        data_size = self._end - self._start
        if data_size > new_data_size:
            new_end = self._end + new_data_size
            self._data[self._end:new_end] = bytearray(data_size)

this dirty fix seems to eliminate the hangs but with larger values it may be impossible to have contiguous memory blocks.

Unfortunately, for Python 3.5 the next issue is raised:

  1. recursive MessageFrame.read causes MemoryError

Thank you for your prompt response, and again, sorry for replying so late.

Here are the tested changes: liutec@7b56264#diff-67ae6a6f455d60cc2bbf6c2ba4aa3ec1

@liutec
Copy link
Contributor Author

liutec commented Dec 3, 2017

Forgot to add the output for the remaining issue for Python 3:

Traceback (most recent call last):
  File "neo4j-bug.py", line 26, in <module>
    for record in records:
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 718, in records
    self._session.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/v1/api.py", line 356, in fetch
    detail_count, _ = self._connection.fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 283, in fetch
    return self._fetch()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 302, in _fetch
    details, summary_signature, summary_metadata = self._unpack()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/connection.py", line 354, in _unpack
    data = unpacker.unpack_list()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/packstream/unpacker.py", line 137, in unpack_list
    return self._unpack_list(marker)
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/packstream/unpacker.py", line 146, in _unpack_list
    return [self._unpack()]
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/packstream/unpacker.py", line 111, in _unpack
    return decode(self.read(size), "utf-8")
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/packstream/unpacker.py", line 42, in read
    return self.source.read(n)
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/io.py", line 86, in read
    value.extend(self.read(n - (end - start)))
[...]
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/io.py", line 86, in read
    value.extend(self.read(n - (end - start)))
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/io.py", line 84, in read
    self._next_pane()
  File "/opt/anaconda/envs/p352/lib/python3.5/site-packages/neo4j/bolt/io.py", line 47, in _next_pane
    if self._current_pane < len(self._panes):
RecursionError: maximum recursion depth exceeded in comparison

@liutec
Copy link
Contributor Author

liutec commented Dec 3, 2017

Hello,

I've also changed MessageFrame.read and removed the recursion. With this I no longer have any issues:
1.6...liutec:1.6-buffer-capacity

The performance seems to have improved significantly based on this test:

import sys
import neo4j
import time
from neo4j.v1 import GraphDatabase, TransientError


print(sys.version_info)
print("neo4j-driver", neo4j.__version__)

neo4j_driver = GraphDatabase.driver(
    uri="bolt://localhost:7687",
    auth=("neo4j", "neo4j")
)
for i in range(1, 10):
    try:
        start = time.time()
        data = "x" * (i * 2 ** 20)
        cypher = "RETURN '{}' AS data".format(data)
        with neo4j_driver.session() as tx:
            records = tx.run(cypher)
            for record in records:
                if record['data'] != data:
                    print('ERROR')
        print('%d MB time = %.2f sec' % (i, time.time() - start))
    except TransientError as e:
        print(str(e))
        break

With your proposed changes and no recursion:

sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
neo4j-driver 1.6.0a1
1 MB time = 0.10 sec
2 MB time = 0.18 sec
3 MB time = 0.32 sec
4 MB time = 0.41 sec
5 MB time = 0.49 sec
6 MB time = 0.66 sec
7 MB time = 0.83 sec
There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.

With just your proposed changes:

sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
neo4j-driver 1.6.0a1
1 MB time = 0.12 sec
2 MB time = 0.36 sec
3 MB time = 0.67 sec
4 MB time = 1.15 sec
5 MB time = 1.70 sec
6 MB time = 2.42 sec
Traceback (most recent call last):
[...]
self.write(b[chunk_remaining:])
MemoryError

During handling of the above exception, another exception occurred:

SystemError: deallocated bytearray object has exported buffers

Process finished with exit code 0

liutec added a commit to liutec/neo4j-python-driver that referenced this issue Dec 4, 2017
liutec added a commit to liutec/neo4j-python-driver that referenced this issue Dec 4, 2017
@zhenlineo
Copy link
Contributor

@liutec
Thanks again for your help for fixing this bug.
The newly released 1.5.3 have the fix included already. Therefore I am closing this issue and consider this as solved. You are most welcome to improve any buffer usage around that code.

Zhen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants