ENH: Support for Postgres NaN data type #10742

tlmaloney · 2015-08-03T21:09:14Z

The below example code is a variant of this comment. The problem is that pandas can't infer the type when the columns are all null. I would like to be able to use Postgres' NaN value if possible, but seems that pandas passes None to SQLAlchemy instead of float('NaN'). I would rather not cast the column's dtype to numeric after-the-fact. From previous link I know it's possible to make Postgres NaN values from SQLAlchemy. Is it possible pandas could support this?

tmaloney@vm-data-1:/var/tmp$ sudo -u postgres psql -c "drop database test;"
DROP DATABASE
tmaloney@vm-data-1:/var/tmp$ cat test.sql
create database test owner tmaloney;
tmaloney@vm-data-1:/var/tmp$ sudo -u postgres psql < /var/tmp/test.sql 
CREATE DATABASE
tmaloney@vm-data-1:/var/tmp$ cat test2.py
#!/usr/bin/env python
'''
Creates a test python db to play with
'''
import sqlalchemy as sa
import pandas as pd
from pandas.util.testing import assert_frame_equal

CON_URL = 'postgres://tmaloney:secret@localhost/test'

db = sa.create_engine(CON_URL)
meta = sa.MetaData()


test_table = sa.Table(
    'test_float', meta,
    # A table contains design id and blocks
    sa.Column('id', sa.Integer, primary_key=True),
    sa.Column('float_val', sa.Float())
)

meta.drop_all(db)
meta.create_all(db)

data = [
    {'float_val': float('nan')},
    {'float_val': float('NaN')},
]
df = pd.DataFrame(data)
engine = sa.create_engine(CON_URL)
df.to_sql('test_float', engine, if_exists='append', index=False)

returned_df = pd.read_sql_table('test_float', engine)
assert_frame_equal(df, returned_df[['float_val']])

returned_df = pd.read_sql('select * from test_float', engine)
assert_frame_equal(df, returned_df[['float_val']])
tmaloney@vm-data-1:/var/tmp$ python test2.py
Traceback (most recent call last):
  File "test2.py", line 37, in <module>
    assert_frame_equal(df, returned_df[['float_val']])
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 746, in assert_frame_equal
    check_names=check_names)
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 675, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 552, in assert_attr_equal
    assert_equal(left_attr,right_attr,"attr is not equal [{0}]" .format(attr))
  File "/usr/local/lib/python2.7/dist-packages/pandas/util/testing.py", line 533, in assert_equal
    assert a == b, "%s: %r != %r" % (msg.format(a,b), a, b)
AssertionError: attr is not equal [dtype]: dtype('float64') != dtype('O')

The first assertion works, the second one does not.

The text was updated successfully, but these errors were encountered:

tlmaloney · 2015-08-03T21:13:10Z

Versions:

pandas==0.16.2
SQLAlchemy==1.0.8
psycopg2==2.6.1

jorisvandenbossche · 2015-08-03T22:35:42Z

The problem is that such float values are not supported for all columns types, and not for all database flavors. In this regard, the approach with None is generic and works in all cases.
For now, we have followed the flavor-agnostic path in pandas, and delegating flavor-specific things to sqlalchemy. And I am not sure to what extent we want to deviate from this path.

Another issue is that the use of NaN in pandas is not only used as Not-a-Number (the actual meaning), but in practice also to represent a 'missing value', for which NULL (and thus None on the python side) is more suited I think.

Anyway, if we would want to add this functionality, we will have to think about the API, as I wouldn't change the default.

redhog · 2020-09-28T13:08:04Z

I think there should be an option to to_sql() to disable the casting (done here

pandas/pandas/io/sql.py

Line 788 in 1accb6e

if ser._can_hold_na:

). There should possibly also be an option to convert None into NaN when loading data using read_sql, even when all rows contain NaN:s so that columns are not converted to dtype object in that case, see #6798

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Aug 3, 2015

jorisvandenbossche added this to the Someday milestone Aug 3, 2015

mroeschke added the Enhancement label May 16, 2020

mroeschke removed this from the Someday milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Support for Postgres NaN data type #10742

ENH: Support for Postgres NaN data type #10742

tlmaloney commented Aug 3, 2015

tlmaloney commented Aug 3, 2015

jorisvandenbossche commented Aug 3, 2015

redhog commented Sep 28, 2020

ENH: Support for Postgres NaN data type #10742

ENH: Support for Postgres NaN data type #10742

Comments

tlmaloney commented Aug 3, 2015

tlmaloney commented Aug 3, 2015

jorisvandenbossche commented Aug 3, 2015

redhog commented Sep 28, 2020