SQL file structure for legacy / sql alchemy #4333

danielballan · 2013-07-23T21:43:53Z

First step for #4163, replacing my PR #4323 aimed at the master branch.

…s pass.

hayd · 2013-07-24T22:12:45Z

ok, updated sql branch to 0.12 release.

SQL file structure for legacy / sql alchemy

hayd · 2013-09-13T12:11:35Z

@danielballan I've squished and rebased these commits to this: https://github.com/hayd/pandas/commits/sql_tests

I get a strange bug with NaN in MySQL atm in test_write_row_by_row, any ideas/would you mind taking a peak? https://travis-ci.org/hayd/pandas/jobs/11314669

It's a pretty horrible hack atm, when trying to get it to work, but I think should refactor pretty well (once we get tests working). Sorry for taking so long!

danielballan · 2013-09-13T13:38:34Z

Hmm. Sure, I'll tinker with this over the weekend.

danielballan · 2013-09-17T14:33:53Z

@hayd, It's tricky to test this on my system, because it is attempting root login to mysql without a password. This is why I had the old tests reading the local mysql config file, .my.cnf. It wasn't pretty, but it allowed the tests to be run on machines where the sql database that's running needs to be secure.

I wonder if we can come up with a way for the line

mysql+pymysql://root:@localhost/pandas_nosetest

to use that same file, and only fallback on root login if the file does not exist.

Edit

Alternatively, we could assume that most people are just running the tests of Travis, but it's obviously easier to debug this locally. I guess I could make a temporary change to the file, hardcoding my own username/password, but that's not my first choice.

hayd · 2013-09-17T16:05:47Z

@danielballan You have wrote a get_engine_url function for this, perhaps that helps (if it's possible to get user/pass out? I guess we could try and use the current engine url, if can't connect to it then use something with .my.cnf.

In fact, I wonder if there is a clever SQLAlchemy way to build the url from the config file... (ideally platform indep)

I've been extracting to PandasSQL class, and it seems so much nicer and easier to reason about (still have the big functions, read and write frames to do)...

jtratner · 2013-09-17T21:04:22Z

@hayd - two suggestions (also, urp on "@jtratner tearing it apart")

I'm glad it's getting clearer with abstracting a class. You might
consider creating an abstract base class so you can be sure you've
implemented all the methods you need on each of the subclasses (Python
won't allow the class to be instantiated if all the methods or properties
are not defined.). When I was looking at an earlier version, I thought it
might make sense to turn parts of it into mixins so that you can, for
example, share elements (like specific insertions or table creations) that
you need for sqlite and also MySQL and put them together, without needing
to have one class inherit from the other. Does create some extra
indirection, so you have to decide on what makes most sense.
Maybe we should consider adopting a config file for pandas in general.
Could read it in after loading (i.e., by reading an environment variable or
something) - would be great to follow the example of Flask and allow
passing a dictionary or follow the PEP (that I think exists) on config
files in Python otherwise. That doesn't need to be in scope for this PR,
and might make sense to leave for something separate and just use the
existing core/config setup.

hayd · 2013-09-17T21:50:50Z

My current implementation is PandasSQL(PandasObject) and then PandasSQLWithEngine(PandasSQL) etc., this naming convention might be a little suspect... but will make it work first, then we can mixin it up or whatever (flavor classes for specific flavors could be an option, previously much was/is "dict based").

Some shared functions will be in the base class but tbh most WithCon just uses WithCur, so the meat is there (and in WithEngine obviously).

hayd · 2013-09-23T02:52:20Z

Please check out impl, I pushed to pydata/sql: https://github.com/pydata/pandas/blob/sql/pandas/io/sql.py

One of the things I've realised is I haven't refactored out is the connection tests, which I'm not sure I understand... seems you give it "something" and magically pull out engine/con/cur. This was previously implicit/inconsistent in each function (I think) :s

Not passing all tests yet, so if you have any ideas on them would be awesome. The commits are now fixed, sorry for taking so long!!

Thoughts?

jtratner · 2013-09-23T03:21:22Z

@hayd why are there PandasSQL, PandasSQLWithCur and PandasSQLWithEngine? What's the difference? Is PandasSQLWithEngine a wrapper for SQLAlchemy?

hayd · 2013-09-23T03:30:11Z

@jtratner basically PandasSQL is the parent class, shared methods are there, and the cursor/engine specific functions are defined in the subclasses.

PandasSQL(engine=engine) creates a PandasSQLWithEngine object (which is a subclass of PandasSQL).

Perhaps naming is a bit suspect, but is descriptive I think, not sure what would be better...

jtratner · 2013-09-23T04:38:34Z

Maybe I'm mistaken, but couldn't you let read_sql and write_frame handle converting the passed connection into a cursor and then only need one of those classes?

hayd · 2013-09-23T04:59:20Z

Yes, and that was how it was done before... this way is less spaghetti.

hayd · 2013-09-23T05:02:40Z

Oh do you mean always use cursor... not sure if there us a benefit to that. (Benefit of engine is to extract tables and select statements etc., you'd have to be careful to not lose that if just using cursor.) Think splitting into logic means much easier to follow code, and less black connection finding magic.

hayd · 2013-09-23T15:42:50Z

@danielballan let me know what you think, as it feels like your baby I've hacked to pieces! :s

@jtratner maybe it's not such a great naming of classes (for the factory), but I really think the code needs to be separated into classes (there was so many ifs before to decide on which type of connection was being used, I think it was quite unmaintainable).

jtratner · 2013-09-23T16:48:34Z

Sorry to push on this, can you just answer my question about life cycle?
I'm wondering here whether we are or are not closing things that we should
be closing. Also, are you thinking that users would be using the
tquery/uquery kinds of methods directly?

hayd · 2013-09-23T17:11:02Z

@jtratner No worries, push away.

No, I don't actually think they would use these directly. I'm happy to remove or _ them tbh.

I think preferred way should be for users to use pandas_sql=PandasSQL(engine=..) then do sql stuff.

pandas_sql.read_sql
pandas_sql.write_frame
pandas_sql.read_table # read entire table from engine

danielballan added 4 commits July 22, 2013 18:17

TST: Import sqlalchemy on Travis.

e712832

read_sql docstring edits

03355c4

read_sql connects via Connection, Engine, file path, or :memory: string

7179732

Separate legacy code into new file, and fallback so that all old test…

4d37747

…s pass.

hayd added a commit that referenced this pull request Jul 24, 2013

Merge pull request #4333 from danielballan/sql

4bbab1a

SQL file structure for legacy / sql alchemy

hayd merged commit 4bbab1a into pandas-dev:sql Jul 24, 2013

danielballan deleted the sql branch July 25, 2013 13:53

Uh oh!

SQL file structure for legacy / sql alchemy #4333

SQL file structure for legacy / sql alchemy #4333

Uh oh!

Conversation

danielballan commented Jul 23, 2013

Uh oh!

hayd commented Jul 24, 2013

Uh oh!

hayd commented Sep 13, 2013

Uh oh!

danielballan commented Sep 13, 2013

Uh oh!

danielballan commented Sep 17, 2013

Uh oh!

hayd commented Sep 17, 2013

Uh oh!

jtratner commented Sep 17, 2013

Uh oh!

hayd commented Sep 17, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

jtratner commented Sep 23, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

jtratner commented Sep 23, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

jtratner commented Sep 23, 2013

Uh oh!

hayd commented Sep 23, 2013

Uh oh!

Uh oh!