Skip to content

Conversation

cdcadman
Copy link
Contributor

@cdcadman cdcadman commented Nov 4, 2022

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

I am splitting this out of #48576 , because it is a major refactor of the code, with the goal of making SQLDatabase only accept a Connection and not an Engine. sqlalchemy 2.0 restricts the methods that are available to them, which makes it harder to write code that works with both. For example, Connection.connect() creates a branched connection in sqlalchemy 1.x, but is removed in 2.0, but this is called in SQLDatabase.check_case_sensitive().

I also added some clarification on how transactions work in DataFrame.to_sql, based on this example, run against pandas 1.5.1:

import sqlite3
from pandas import DataFrame
from sqlalchemy import create_engine

with sqlite3.connect(":memory:") as con:
    con.execute("create table test (A integer, B integer)")
    row_count = con.execute("insert into test values (2, 4), (5, 10)").rowcount
    if row_count > 1:
        con.rollback()
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 0

with sqlite3.connect(":memory:") as con:
    con.execute("create table test (A integer, B integer)")
    row_count = DataFrame({'A': [2, 5], 'B': [4, 10]}).to_sql('test', con, if_exists='append', index=False)
    if row_count > 1:
        con.rollback() # does nothing, because pandas already committed the transaction.
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 2
    
with create_engine("sqlite:///:memory:").connect() as con:
    with con.begin():
        con.exec_driver_sql("create table test (A integer, B integer)")
    try:
        with con.begin():
            row_count = DataFrame({'A': [2, 5], 'B': [4, 10]}).to_sql('test', con, if_exists='append', index=False)
            assert row_count < 2
    except AssertionError:
        pass
    print(con.execute("select count(*) from test").fetchall()[0][0]) # prints 0

@mroeschke
Copy link
Member

Thanks for the followup, but could you pair down the changes even more (1 PR for 1 targeted change). It will greatly help the review process. It appears this PR is tackling 3 things which would be better if they were 3 individual PRs

  • SQLDatabase accepting connections only
  • Disposing engines
  • test refactoring

@cdcadman
Copy link
Contributor Author

cdcadman commented Nov 4, 2022

Yes, I think I can do that. For the first part, where I make SQLDatabase accept Connections only, I might have to disable the tests which pass an Engine, but that will be a much smaller change than the test refactor.

@cdcadman
Copy link
Contributor Author

cdcadman commented Nov 5, 2022

I pulled the engine disposal and test refactoring out of this PR. I think the engine disposal can wait until that last PR. But part of my motivation for going with this approach was so that I could put the engine disposal into the _sqlalchemy_con function.

@mroeschke mroeschke added the IO SQL to_sql, read_sql, read_sql_query label Nov 7, 2022
@cdcadman
Copy link
Contributor Author

@mroeschke , I've addressed all the comments. After this PR, the next step would be to refactor the test classes, since the ones involving the sqlalchemy engines would no longer be necessary. In a later PR, I would dispose of the sqlalchemy engine, if it needed to be created, within the new function _sqlalchemy_con.



@contextmanager
def _sqlalchemy_con(connectable, need_transaction: bool):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be reused in your followup PR? I have a slight preference of just folding this into pandasSQL_builder for now until there's a need to share this connection logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only plan for _sqlalchemy_con to be called by pandasSQL_builder and not by anything else. I split it out to reduce the amount of indentation and keep the functions small. There is an additional try/finally block and some additional if/else blocks in lines 772-800 here: https://github.com/cdcadman/pandas/blob/sql_fixes/pandas/io/sql.py . I think I can fold _sqlalchemy_con into pandasSQL_builder if you still prefer that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay gotcha. I think it'll be fine to leave as a separate function with more code incoming.

However, could you replace import sqlalchemy here with sqlalchemy = import_optional_dependnecy(..., errors="raise")? Just so that if someone else tries using this function in the future it's known that sqlalchemy is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I made the change.

@mroeschke mroeschke added this to the 2.0 milestone Nov 17, 2022
@mroeschke mroeschke merged commit 08f070d into pandas-dev:main Nov 17, 2022
@mroeschke
Copy link
Member

Awesome, thanks for the progress here @cdcadman

@cdcadman cdcadman deleted the refactor_sql branch November 17, 2022 18:54
@cdcadman cdcadman mentioned this pull request Nov 17, 2022
5 tasks
MarcoGorelli pushed a commit to MarcoGorelli/pandas that referenced this pull request Nov 18, 2022
…emy 2.0. (pandas-dev#49531)

* DOC: Clarify behavior of DataFrame.to_sql

* CLN: Make SQLDatabase only accept a sqlalchemy Connection.

Co-authored-by: Chuck Cadman <[email protected]>
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
…emy 2.0. (pandas-dev#49531)

* DOC: Clarify behavior of DataFrame.to_sql

* CLN: Make SQLDatabase only accept a sqlalchemy Connection.

Co-authored-by: Chuck Cadman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants