-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
SQL file structure for legacy / sql alchemy #4333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ok, updated sql branch to 0.12 release. |
SQL file structure for legacy / sql alchemy
@danielballan I've squished and rebased these commits to this: https://github.com/hayd/pandas/commits/sql_tests I get a strange bug with NaN in MySQL atm in test_write_row_by_row, any ideas/would you mind taking a peak? https://travis-ci.org/hayd/pandas/jobs/11314669 It's a pretty horrible hack atm, when trying to get it to work, but I think should refactor pretty well (once we get tests working). Sorry for taking so long! |
Hmm. Sure, I'll tinker with this over the weekend. |
@hayd, It's tricky to test this on my system, because it is attempting root login to mysql without a password. This is why I had the old tests reading the local mysql config file, I wonder if we can come up with a way for the line
to use that same file, and only fallback on root login if the file does not exist. Edit Alternatively, we could assume that most people are just running the tests of Travis, but it's obviously easier to debug this locally. I guess I could make a temporary change to the file, hardcoding my own username/password, but that's not my first choice. |
@danielballan You have wrote a get_engine_url function for this, perhaps that helps (if it's possible to get user/pass out? I guess we could try and use the current engine url, if can't connect to it then use something with In fact, I wonder if there is a clever SQLAlchemy way to build the url from the config file... (ideally platform indep) I've been extracting to PandasSQL class, and it seems so much nicer and easier to reason about (still have the big functions, read and write frames to do)... |
@hayd - two suggestions (also, urp on "@jtratner tearing it apart")
|
My current implementation is PandasSQL(PandasObject) and then PandasSQLWithEngine(PandasSQL) etc., this naming convention might be a little suspect... but will make it work first, then we can mixin it up or whatever (flavor classes for specific flavors could be an option, previously much was/is "dict based"). Some shared functions will be in the base class but tbh most WithCon just uses WithCur, so the meat is there (and in WithEngine obviously). |
Please check out impl, I pushed to pydata/sql: https://github.com/pydata/pandas/blob/sql/pandas/io/sql.py One of the things I've realised is I haven't refactored out is the connection tests, which I'm not sure I understand... seems you give it "something" and magically pull out engine/con/cur. This was previously implicit/inconsistent in each function (I think) :s Not passing all tests yet, so if you have any ideas on them would be awesome. The commits are now fixed, sorry for taking so long!! Thoughts? |
@hayd why are there |
@jtratner basically PandasSQL is the parent class, shared methods are there, and the cursor/engine specific functions are defined in the subclasses.
Perhaps naming is a bit suspect, but is descriptive I think, not sure what would be better... |
Maybe I'm mistaken, but couldn't you let |
Yes, and that was how it was done before... this way is less spaghetti. |
Oh do you mean always use cursor... not sure if there us a benefit to that. (Benefit of engine is to extract tables and select statements etc., you'd have to be careful to not lose that if just using cursor.) Think splitting into logic means much easier to follow code, and less black connection finding magic. |
@danielballan let me know what you think, as it feels like your baby I've hacked to pieces! :s @jtratner maybe it's not such a great naming of classes (for the factory), but I really think the code needs to be separated into classes (there was so many ifs before to decide on which type of connection was being used, I think it was quite unmaintainable). |
Sorry to push on this, can you just answer my question about life cycle? |
@jtratner No worries, push away. No, I don't actually think they would use these directly. I'm happy to remove or I think preferred way should be for users to use pandas_sql=PandasSQL(engine=..) then do sql stuff.
|
First step for #4163, replacing my PR #4323 aimed at the master branch.