Skip to content

DataFrame.to_sql generated text field that could not be used by "group by". (MS sql) #7957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
NathanielCapital opened this issue Aug 7, 2014 · 11 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@NathanielCapital
Copy link

In _sqlalchemy_type function, pandas import Text type from sqlalchemy as the default type when generating text field in SQL table, which is transformed into "text" type in microsoft sql. This type could not be used by "group by" in SQL query.
SQL query with a group by clause with the field generate an error msg like:

Msg 306, Level 16, State 2, Line 3
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

I tried default text field to be the String, which works for MS sql.

@jorisvandenbossche
Copy link
Member

@NathanielCapital Did it work without specifying a max length? (so with String instead of String(some_number)). I suppose a varchar type will be created for MS SQL, but eg http://docs.sqlalchemy.org/en/rel_0_9/dialects/mssql.html#sqlalchemy.dialects.mssql.VARCHAR seems to imply a length should be given.

For strings it seems indeed better to use String (which will be converted to character varying in general I think). But would there be other consequences of changing Text to String?
@mangecoeur @danielballan @hayd any idea?

@NathanielCapital
Copy link
Author

@jorisvandenbossche Yes, it works without specifying a max length. I just made it return String type other than Text. The table text field generated is nvarchar(max) on my Microsoft SQL 2008 R2.

@aergener
Copy link

@jorisvandenbossche Text is deprecated in the SQL Server, and has limited functionality. (see links below) Nvarchar and varchar types are preferable. Since this may vary between SQL flavors, perhaps there could be a setter function to set the string representation in pandas.io.sql.PandasSQLTable._sqlalchemy_type.

http://msdn.microsoft.com/en-us/library/ms187993.aspx
http://stackoverflow.com/questions/564755/sql-server-text-type-vs-varchar-data-type

@mangecoeur
Copy link
Contributor

@aergener it would be better to avoid requiring setters like this - I'm confident we can get this to work without encouraging hacks, though anyone who really wants to change behaviour is free to subclass PandasSQLTable and override whatever they like.

Seems there are 2 options:
http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.Unicode
and
http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.String

As well as explicitly setting nvarchar etc. (NVARCHAR is automatically subclass of Unicode it seems). Personally I like the idea of using Unicode since it fits well with Python3 going forwards where people expect all text to be unicode. Not sure what that means for python2 however, might then require explicit conversion so it might not be worth the trouble :/

@jorisvandenbossche
Copy link
Member

I read in the SQLAlchemy docs (http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.Unicode):

When using the Unicode type, it is only appropriate to pass Python unicode objects, and not plain str. If a plain str is passed under Python 2, a warning is emitted.

In that case, it seems to me that we should use String? Or at least check for the type and maybe return something different (String/Unicode) dependent on type or python 2/3?
I suppose always using the String type will work as long as your (unicode) data are also valid ascii? But when having non-ascii characters this will lead to an error?

@jorisvandenbossche
Copy link
Member

With lib.infer_dtype we could check for string/unicode in principle. And then for python 2 return the appropriate String or Unicode, and for python 3 always Unicode ?

@jreback
Copy link
Contributor

jreback commented Aug 28, 2014

careful as this must scan all data

@jorisvandenbossche
Copy link
Member

@jreback yes, but we already do this .. (if it is an object type column, to check for date/time: https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L788)

@jreback
Copy link
Contributor

jreback commented Aug 28, 2014

@jorisvandenbossche right ok

@jorisvandenbossche
Copy link
Member

Problem with changing Text to String is that for some databases, we have to specify a maximum length. For MSSQL or PostgreSQL this does not seem to be a problem (they accept varchar without specifying a maximum lenght, but for MySQL this is required. When you try to use just String, you get the following error:

CompileError: (in table 'test_string', column 'string'): VARCHAR requires a length on dialect mysql

which is what is actually also explained in the sqlalchemy docs of String: http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.String

@jorisvandenbossche
Copy link
Member

I know this does not solve the issue, but after PR #8926, you can specify a sqlalchemy type per column, so to override the default chosen one. So at least that gives a way to deal with this issue for now.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

6 participants