Skip to content

[sqlite3] cleanup and harden Connection.__init__ #89289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erlend-aasland opened this issue Sep 7, 2021 · 21 comments · Fixed by #92214, #28227, #28231 or #92258
Closed

[sqlite3] cleanup and harden Connection.__init__ #89289

erlend-aasland opened this issue Sep 7, 2021 · 21 comments · Fixed by #92214, #28227, #28231 or #92258
Labels
3.11 only security fixes extension-modules C modules in the Modules dir topic-sqlite3

Comments

@erlend-aasland
Copy link
Contributor

BPO 45126
Nosy @encukou, @corona10, @miss-islington, @erlend-aasland, @DiddiLeija
PRs
  • bpo-45126: Harden sqlite3 connection initialisation #28227
  • bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ #28231
  • bpo-45126: Deprecate sqlite3 Connection and Cursor reinitialization #28234
  • [3.10] bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ (GH-28231) #28298
  • [3.9] bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ (GH-28231). (GH-28298) #28302
  • bpo-45126: Ensure sqlite3.Connection is unusable if __init__ fails #29160
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-09-07.11:46:02.065>
    labels = ['extension-modules', '3.11']
    title = '[sqlite3] cleanup and harden Connection and Cursor __init__'
    updated_at = <Date 2021-11-23.13:11:11.371>
    user = 'https://github.com/erlend-aasland'

    bugs.python.org fields:

    activity = <Date 2021-11-23.13:11:11.371>
    actor = 'petr.viktorin'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2021-09-07.11:46:02.065>
    creator = 'erlendaasland'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45126
    keywords = ['patch']
    message_count = 18.0
    messages = ['401248', '401352', '401356', '401357', '401358', '401359', '401361', '401362', '401363', '401366', '401371', '401561', '401666', '401682', '401684', '406406', '406419', '406838']
    nosy_count = 5.0
    nosy_names = ['petr.viktorin', 'corona10', 'miss-islington', 'erlendaasland', 'DiddiLeija']
    pr_nums = ['28227', '28231', '28234', '28298', '28302', '29160']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue45126'
    versions = ['Python 3.11']

    @erlend-aasland
    Copy link
    Contributor Author

    Quoting Petr Viktorin in PR 27940, #27940 (comment):

    • db is not set to NULL if init fails.
    • This segfaults for me:
      import sqlite3
    
      conn = sqlite3.connect(':memory:')
      conn.execute('CREATE TABLE foo (bar)')

    try:
    conn.__init__('/bad-file/')
    except sqlite3.OperationalError:
    pass

    conn.execute('INSERT INTO foo (bar) VALUES (1), (2), (3), (4)')

    Other issues:

    • reinitialisation is not handled gracefully
    • __init__ is messy; members are initialised here and there

    Suggested to reorder connection __init__ in logical groups to more easily handle errors:

    1. handle reinit
    2. open and configure database
    3. create statement LRU cache and weak ref cursor list
    4. initialise members in the order they're defined in connection.h
    5. set isolation level, since it's a weird case anyway

    @erlend-aasland erlend-aasland added 3.11 only security fixes extension-modules C modules in the Modules dir labels Sep 7, 2021
    @encukou
    Copy link
    Member

    encukou commented Sep 8, 2021

    This is a minefield. If anyone has a use case for it, I'd *love* to hear it, but to me it seems that proper reinit support will be a lot of work (now and in future maintenance) for no gain. You can always create a new connection object.

    Consider instead deprecating reinitialization, documenting it as unpredictable & removing it in Python 3.13.

    Here's another thing to test:

    import sqlite3
    
    conn = sqlite3.connect(':memory:')
    cursor = conn.execute('CREATE TABLE foo (bar)')
    
    try:
        conn.__init__('/bad-file/')
    except sqlite3.OperationalError:
        pass
    
    cursor.execute('INSERT INTO foo (bar) VALUES (1), (2), (3), (4)')

    And here's the kind of behavior that should be specified (and tested), and might even change to make more sense:

    import sqlite3
    
    conn = sqlite3.connect(':memory:')
    conn.text_factory=bytes
    conn.row_factory = sqlite3.Row
    cursor = conn.execute('CREATE TABLE foo (bar)')
    cursor.execute('INSERT INTO foo (bar) VALUES ("1"), ("2"), ("3"), ("4")')
    cursor.execute('SELECT bar FROM foo')
    
    for row in cursor:
        print(row, list(row))
        break
    
    conn.__init__(':memory:')
    conn.execute('CREATE TABLE foo (bar)')
    conn.execute('INSERT INTO foo (bar) VALUES ("a"), ("b"), ("c"), ("d")')
    
    # Currently this uses the old database, old row_factory, but new text_factory"
    for row in cursor:
        print(row, list(row))

    We should also do a review of all places connection is used from Cursor, Statement, etc. to ensure everything continue to work with a different db. This can be done (and maybe you already did), but I'm also worried how well future maintainers can keep reinitialization in mind.

    And all this will also need to be done for Cursor.

    @erlend-aasland
    Copy link
    Contributor Author

    I modified your second example slightly:

    import sqlite3
    
    conn = sqlite3.connect(":memory:")
    conn.text_factory=bytes
    conn.row_factory = sqlite3.Row
    cursor = conn.execute("CREATE TABLE foo (bar)")
    numbers = range(4)
    cursor.executemany("INSERT INTO foo (bar) VALUES (?)", ((str(v),) for v in numbers))
    cursor.execute("SELECT bar FROM foo")
    
    print("first fetch")
    for row in cursor.fetchmany(2):
        print(type(row[0]))
    
    conn.__init__(":memory:")
    conn.execute("CREATE TABLE foo (bar)")
    letters = "a", "b", "c", "d"
    conn.executemany("INSERT INTO foo (bar) VALUES (?)", ((v,) for v in letters))
    
    # Currently this uses the old database, old row_factory, but new text_factory"
    print("second fetch")
    for row in cursor.fetchall():
        print(type(row[0]))
    

    Here's the output:
    first fetch
    <class 'bytes'>
    <class 'bytes'>
    second fetch
    <class 'str'>
    <class 'str'>

    @erlend-aasland
    Copy link
    Contributor Author

    Note: I ran that with PR 28227.

    OTOH: I do agree that there's a lot of pitfalls here, especially in the future. In the long run, it is probably best to deprecate reinit, and disable it in Python 3.13.

    @erlend-aasland
    Copy link
    Contributor Author

    Modifying the loops to also print the values:

    first fetch
    b'0' <class 'bytes'>
    b'1' <class 'bytes'>
    second fetch
    2 <class 'str'>
    3 <class 'str'>

    @erlend-aasland
    Copy link
    Contributor Author

    FYI, I've expanded the reinit tests and added a deprecation warning to the PR.

    @encukou
    Copy link
    Member

    encukou commented Sep 8, 2021

    I think a deprecation should be discussed a bit more widely than on bpo, so I opened a thread here: https://discuss.python.org/t/deprecating-sqlite-object-reinitialization/10503

    @erlend-aasland
    Copy link
    Contributor Author

    Great, thanks.

    @encukou
    Copy link
    Member

    encukou commented Sep 8, 2021

    As for the effort to fix this:

    If we deprecate this, there should be no new users of it in 3.11+.

    If we deprecate and also fix this, and we happen to introduce bugs or behavior changes, then people that use it now will need to:

    1. adapt to the new behavior
    2. find a different way to do things (before 3.13)

    If we deprecate but keep the buggy behavior it as it is, (1) is not needed. Less work for both us and the users.

    @erlend-aasland
    Copy link
    Contributor Author

    If we deprecate but keep the buggy behavior it as it is, (1) is not needed. Less work for both us and the users.

    Indeed.

    There's still a ref leak I'd like to take care of, though: if the first audit fails, database_obj leaks.

    @erlend-aasland
    Copy link
    Contributor Author

    I'll save the cleanup till Python 3.13 dev is started. I've opened a PR for fixing the ref leak (should be backported), and a draft PR for deprecating Connection and Cursor reinitialisation.

    @erlend-aasland
    Copy link
    Contributor Author

    Updated title to include sqlite3.Cursor as well, since cursors and connections are very much entwined.

    @erlend-aasland erlend-aasland changed the title [sqlite3] cleanup and harden connection init [sqlite3] cleanup and harden Connection and Cursor __init__ Sep 10, 2021
    @erlend-aasland erlend-aasland changed the title [sqlite3] cleanup and harden connection init [sqlite3] cleanup and harden Connection and Cursor __init__ Sep 10, 2021
    @corona10
    Copy link
    Member

    New changeset c78d5ca by Erlend Egeberg Aasland in branch 'main':
    bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ (GH-28231)
    c78d5ca

    @corona10
    Copy link
    Member

    New changeset aa6dd54 by Erlend Egeberg Aasland in branch '3.10':
    [3.10] bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ (GH-28231). (GH-28298)
    aa6dd54

    @miss-islington
    Copy link
    Contributor

    New changeset 5d28bb6 by Miss Islington (bot) in branch '3.9':
    [3.10] bpo-45126: Fix ref. leak in sqlite3.Connection.__init__ (GH-28231). (GH-28298)
    5d28bb6

    @encukou
    Copy link
    Member

    encukou commented Nov 16, 2021

    New changeset 9d6215a by Erlend Egeberg Aasland in branch 'main':
    bpo-45126: Harden sqlite3 connection initialisation (GH-28227)
    9d6215a

    @erlend-aasland
    Copy link
    Contributor Author

    The remaining PR (GH 28234) deprecates reinitialisation. I'm not sure it is a good idea to pursue this, so I'm leaning towards closing this as fixed, and also closing GH 28234.

    @encukou
    Copy link
    Member

    encukou commented Nov 23, 2021

    I think it's a good idea, but without the "will be disallowed in Python 3.13" part -- we should tell people that it's discouraged, but there's not much point in removing it.

    But there's no consensus whether that's a good way to handle things, in general. So I'll leave it up to you.

    @erlend-aasland
    Copy link
    Contributor Author

    erlend-aasland commented May 3, 2022

    There are two more issues in connection init:

    • sqlite3_open_v2 return a sqlite3 object, even if it fails; we must make sure these resources are freed if __init__ fails
    • upon failure, the LRU cache and weak ref list pointers are Py_XDECREFd; these can safely be turned into Py_DECREFs

    Note, 3.10 does not have the sqlite3_open_v2 problem, as it assigns the database pointer directly to self->db, so resources are implicitly freed in the upcoming dealloc. In main, I prefer to keep the initialisation as it is, and free resources explicitly; it makes the code easier to read, and error handling easier to understand.

    erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue May 3, 2022
    - Make sure SQLite resources are freed if database open fails
    - Remove unneeded branches if init is aborted
    @erlend-aasland
    Copy link
    Contributor Author

    Quoting the SQLite docs for sqlite3_open_v2:

    Whether or not an error occurs when it is opened, resources associated with the database connection handle should be released by passing it to sqlite3_close() when it is no longer required.

    @erlend-aasland
    Copy link
    Contributor Author

    This issue was originally for sqlite3.Connection.__init__ only. I later edited the title to include sqlite3.Cursor.__init__. The problems relating to sqlite3.Connection.__init__ have been resolved. I suggest opening a new issue if we want to improve the cursor init method.

    Repository owner moved this from In Progress to Done in sqlite3 issues May 21, 2022
    @erlend-aasland erlend-aasland changed the title [sqlite3] cleanup and harden Connection and Cursor __init__ [sqlite3] cleanup and harden Connection.__init__ May 21, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment