Skip to content

SQL file structure for legacy / sql alchemy #4333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 24, 2013
Merged

Conversation

danielballan
Copy link
Contributor

First step for #4163, replacing my PR #4323 aimed at the master branch.

@hayd
Copy link
Contributor

hayd commented Jul 24, 2013

ok, updated sql branch to 0.12 release.

hayd added a commit that referenced this pull request Jul 24, 2013
SQL file structure for legacy / sql alchemy
@hayd hayd merged commit 4bbab1a into pandas-dev:sql Jul 24, 2013
@danielballan danielballan deleted the sql branch July 25, 2013 13:53
@hayd
Copy link
Contributor

hayd commented Sep 13, 2013

@danielballan I've squished and rebased these commits to this: https://github.com/hayd/pandas/commits/sql_tests

I get a strange bug with NaN in MySQL atm in test_write_row_by_row, any ideas/would you mind taking a peak? https://travis-ci.org/hayd/pandas/jobs/11314669

It's a pretty horrible hack atm, when trying to get it to work, but I think should refactor pretty well (once we get tests working). Sorry for taking so long!

@danielballan
Copy link
Contributor Author

Hmm. Sure, I'll tinker with this over the weekend.

@danielballan
Copy link
Contributor Author

@hayd, It's tricky to test this on my system, because it is attempting root login to mysql without a password. This is why I had the old tests reading the local mysql config file, .my.cnf. It wasn't pretty, but it allowed the tests to be run on machines where the sql database that's running needs to be secure.

I wonder if we can come up with a way for the line

mysql+pymysql://root:@localhost/pandas_nosetest

to use that same file, and only fallback on root login if the file does not exist.

Edit

Alternatively, we could assume that most people are just running the tests of Travis, but it's obviously easier to debug this locally. I guess I could make a temporary change to the file, hardcoding my own username/password, but that's not my first choice.

@hayd
Copy link
Contributor

hayd commented Sep 17, 2013

@danielballan You have wrote a get_engine_url function for this, perhaps that helps (if it's possible to get user/pass out? I guess we could try and use the current engine url, if can't connect to it then use something with .my.cnf.

In fact, I wonder if there is a clever SQLAlchemy way to build the url from the config file... (ideally platform indep)

I've been extracting to PandasSQL class, and it seems so much nicer and easier to reason about (still have the big functions, read and write frames to do)...

@jtratner
Copy link
Contributor

@hayd - two suggestions (also, urp on "@jtratner tearing it apart")

  1. I'm glad it's getting clearer with abstracting a class. You might
    consider creating an abstract base class so you can be sure you've
    implemented all the methods you need on each of the subclasses (Python
    won't allow the class to be instantiated if all the methods or properties
    are not defined.). When I was looking at an earlier version, I thought it
    might make sense to turn parts of it into mixins so that you can, for
    example, share elements (like specific insertions or table creations) that
    you need for sqlite and also MySQL and put them together, without needing
    to have one class inherit from the other. Does create some extra
    indirection, so you have to decide on what makes most sense.
  2. Maybe we should consider adopting a config file for pandas in general.
    Could read it in after loading (i.e., by reading an environment variable or
    something) - would be great to follow the example of Flask and allow
    passing a dictionary or follow the PEP (that I think exists) on config
    files in Python otherwise. That doesn't need to be in scope for this PR,
    and might make sense to leave for something separate and just use the
    existing core/config setup.

@hayd
Copy link
Contributor

hayd commented Sep 17, 2013

My current implementation is PandasSQL(PandasObject) and then PandasSQLWithEngine(PandasSQL) etc., this naming convention might be a little suspect... but will make it work first, then we can mixin it up or whatever (flavor classes for specific flavors could be an option, previously much was/is "dict based").

Some shared functions will be in the base class but tbh most WithCon just uses WithCur, so the meat is there (and in WithEngine obviously).

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

Please check out impl, I pushed to pydata/sql: https://github.com/pydata/pandas/blob/sql/pandas/io/sql.py

One of the things I've realised is I haven't refactored out is the connection tests, which I'm not sure I understand... seems you give it "something" and magically pull out engine/con/cur. This was previously implicit/inconsistent in each function (I think) :s

Not passing all tests yet, so if you have any ideas on them would be awesome. The commits are now fixed, sorry for taking so long!!

Thoughts?

@jtratner
Copy link
Contributor

@hayd why are there PandasSQL, PandasSQLWithCur and PandasSQLWithEngine? What's the difference? Is PandasSQLWithEngine a wrapper for SQLAlchemy?

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

@jtratner basically PandasSQL is the parent class, shared methods are there, and the cursor/engine specific functions are defined in the subclasses.

PandasSQL(engine=engine) creates a PandasSQLWithEngine object (which is a subclass of PandasSQL).

Perhaps naming is a bit suspect, but is descriptive I think, not sure what would be better...

@jtratner
Copy link
Contributor

Maybe I'm mistaken, but couldn't you let read_sql and write_frame handle converting the passed connection into a cursor and then only need one of those classes?

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

Yes, and that was how it was done before... this way is less spaghetti.

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

Oh do you mean always use cursor... not sure if there us a benefit to that. (Benefit of engine is to extract tables and select statements etc., you'd have to be careful to not lose that if just using cursor.) Think splitting into logic means much easier to follow code, and less black connection finding magic.

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

@danielballan let me know what you think, as it feels like your baby I've hacked to pieces! :s

@jtratner maybe it's not such a great naming of classes (for the factory), but I really think the code needs to be separated into classes (there was so many ifs before to decide on which type of connection was being used, I think it was quite unmaintainable).

@jtratner
Copy link
Contributor

Sorry to push on this, can you just answer my question about life cycle?
I'm wondering here whether we are or are not closing things that we should
be closing. Also, are you thinking that users would be using the
tquery/uquery kinds of methods directly?

@hayd
Copy link
Contributor

hayd commented Sep 23, 2013

@jtratner No worries, push away.

No, I don't actually think they would use these directly. I'm happy to remove or _ them tbh.

I think preferred way should be for users to use pandas_sql=PandasSQL(engine=..) then do sql stuff.

pandas_sql.read_sql
pandas_sql.write_frame
pandas_sql.read_table # read entire table from engine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants