Skip to content

ENH: Support for Postgres NaN data type #10742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tlmaloney opened this issue Aug 3, 2015 · 3 comments
Open

ENH: Support for Postgres NaN data type #10742

tlmaloney opened this issue Aug 3, 2015 · 3 comments
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query

Comments

@tlmaloney
Copy link

The below example code is a variant of this comment. The problem is that pandas can't infer the type when the columns are all null. I would like to be able to use Postgres' NaN value if possible, but seems that pandas passes None to SQLAlchemy instead of float('NaN'). I would rather not cast the column's dtype to numeric after-the-fact. From previous link I know it's possible to make Postgres NaN values from SQLAlchemy. Is it possible pandas could support this?

tmaloney@vm-data-1:/var/tmp$ sudo -u postgres psql -c "drop database test;"
DROP DATABASE
tmaloney@vm-data-1:/var/tmp$ cat test.sql
create database test owner tmaloney;
tmaloney@vm-data-1:/var/tmp$ sudo -u postgres psql < /var/tmp/test.sql 
CREATE DATABASE
tmaloney@vm-data-1:/var/tmp$ cat test2.py
#!/usr/bin/env python
'''
Creates a test python db to play with
'''
import sqlalchemy as sa
import pandas as pd
from pandas.util.testing import assert_frame_equal

CON_URL = 'postgres://tmaloney:secret@localhost/test'

db = sa.create_engine(CON_URL)
meta = sa.MetaData()


test_table = sa.Table(
    'test_float', meta,
    # A table contains design id and blocks
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('float_val', sa.Float())
)

meta.drop_all(db)
meta.create_all(db)

data = [
    {'float_val': float('nan')},
    {'float_val': float('NaN')},
]
df = pd.DataFrame(data)
engine = sa.create_engine(CON_URL)
df.to_sql('test_float', engine, if_exists='append', index=False)

returned_df = pd.read_sql_table('test_float', engine)
assert_frame_equal(df, returned_df[['float_val']])

returned_df = pd.read_sql('select * from test_float', engine)
assert_frame_equal(df, returned_df[['float_val']])
tmaloney@vm-data-1:/var/tmp$ python test2.py
Traceback (most recent call last):
  File "test2.py", line 37, in <module>
    assert_frame_equal(df, returned_df[['float_val']])
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 746, in assert_frame_equal
    check_names=check_names)
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 675, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 552, in assert_attr_equal
    assert_equal(left_attr,right_attr,"attr is not equal [{0}]" .format(attr))
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 533, in assert_equal
    assert a == b, "%s: %r != %r" % (msg.format(a,b), a, b)
AssertionError: attr is not equal [dtype]: dtype('float64') != dtype('O')

The first assertion works, the second one does not.

@tlmaloney
Copy link
Author

Versions:

pandas==0.16.2
SQLAlchemy==1.0.8
psycopg2==2.6.1

@jorisvandenbossche
Copy link
Member

The problem is that such float values are not supported for all columns types, and not for all database flavors. In this regard, the approach with None is generic and works in all cases.
For now, we have followed the flavor-agnostic path in pandas, and delegating flavor-specific things to sqlalchemy. And I am not sure to what extent we want to deviate from this path.

Another issue is that the use of NaN in pandas is not only used as Not-a-Number (the actual meaning), but in practice also to represent a 'missing value', for which NULL (and thus None on the python side) is more suited I think.

Anyway, if we would want to add this functionality, we will have to think about the API, as I wouldn't change the default.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Aug 3, 2015
@jorisvandenbossche jorisvandenbossche added this to the Someday milestone Aug 3, 2015
@redhog
Copy link

redhog commented Sep 28, 2020

I think there should be an option to to_sql() to disable the casting (done here

if ser._can_hold_na:

). There should possibly also be an option to convert None into NaN when loading data using read_sql, even when all rows contain NaN:s so that columns are not converted to dtype object in that case, see #6798

@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

4 participants