ENH: Infer dtype from non-nulls when pushing to SQL #8973

artemyk · 2014-12-03T01:21:26Z

This infers dtype from non-null values for insertion into SQL database. See #8778 . I had to alter @tiagoantao test a bit.

Like @tiagoantao, I skipped writing tests for legacy MySQL. Support for this will be removed soon, right?

As a side note, lib.infer_dtype throws an exception for categorical data -- probably shouldn't happen, right?

@jorisvandenbossche
@jahfet
@tiagoantao

jreback · 2014-12-03T01:25:53Z

can u show what u r passing that raises on Categorical
infer_dtypes is internal and cannot be called with certain types of things
but sounds like a bug

artemyk · 2014-12-03T01:29:37Z

Yes, here's the snippet. Interestingly, it works for a dataframe but not for a series.

In [11]: import pandas as pd

In [12]: import pandas.lib as lib

In [13]: df = pd.DataFrame({'a':pd.Series(['A','B'], dtype='category')})

In [14]: lib.infer_dtype(df)
Out[14]: 'string'

In [15]: lib.infer_dtype(df['a'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-20fdf746850b> in <module>()
----> 1 lib.infer_dtype(df['a'])

/Users/artemy/dev/pandas/pandas/lib.so in pandas.lib.infer_dtype (pandas/lib.c:41462)()

TypeError: Cannot convert Categorical to numpy.ndarray

jreback · 2014-12-03T01:49:05Z

that's not really valid input, it has to be an ndarray (though we have relaxed it a bit). But I will add this as it should be able to infer (its easy to do), it just checks the dtype (and doesn't do anything more)

jreback · 2014-12-03T02:53:48Z

@artemyk ok, fixed up by #8975

artemyk · 2014-12-03T04:19:05Z

@jreback Great, thanks! Simplified this PR a bit (no longer doing special check for categorical type)

jorisvandenbossche · 2014-12-03T12:35:41Z

pandas/io/sql.py

@@ -884,37 +884,49 @@ def _harmonize_columns(self, parse_dates=None):
            except KeyError:
                pass  # this column not in results

+    def _get_notnull_col_dtype(self, col):
+        col_for_inference = col


Can you add a docstring to explain why this is needed?

jorisvandenbossche · 2014-12-03T13:13:43Z

@artemyk put some comments, but in general it looks very good!
ok for skipping the tests on mysql legacy

artemyk · 2014-12-03T18:26:34Z

@jorisvandenbossche Thanks, made the fixes.

Regarding lib.infer_dtype(com._ensure_object(col)) --- this is from #6932 , not sure why _ensure_object was used , I thought that code path would occur when col is already dtype object (@jreback may know) . In any case, the new dtype inference code takes care of inferring date and time.

And yes, I removed the previous testing for dtype, lines like elif issubclass(pytype, np.integer): &c., and now rely on lib.infer_dtype to do this. Not sure what you mean by should we also do that for sqlalchemy + test it.

artemyk · 2014-12-03T20:00:10Z

@jorisvandenbossche Oh, that makes sense! Added check / test for complex numbers .

jreback · 2014-12-06T17:17:07Z

@jorisvandenbosscher ready to go?

jorisvandenbossche · 2014-12-07T12:33:06Z

yep, looking good! @artemyk can you add a release note and squash?

artemyk · 2014-12-07T19:27:52Z

@jorisvandenbossche @jreback Done

jreback · 2014-12-07T19:30:19Z

looks ok to me, any docs need to be updated for this? (maybe a note somewhere explaining what this is doing), e.g. maybe add a 'dtypes' sub-section for SQL (as getting more traction in this area).

Could be done in separate issue.

jorisvandenbossche · 2014-12-07T20:08:40Z

@artemyk hmm, can you rebase again?

Docs can go in a seperate PR I think (more general than this)

jorisvandenbossche · 2014-12-07T21:23:12Z

@jreback did a PR for your doc suggestion

Minor cleanup Minor rename Simplifying Code review fixes Complex numbers Release note Release note

artemyk · 2014-12-08T01:37:58Z

@jorisvandenbossche Rebased on master, let me if I still need to fix something. And thanks for doing the docs!

ENH: Infer dtype from non-nulls when pushing to SQL

jorisvandenbossche · 2014-12-08T07:42:42Z

@artemyk Thanks a lot!

artemyk force-pushed the notnulldtype_sql branch from 4262ab6 to b8537c0 Compare December 3, 2014 01:22

artemyk force-pushed the notnulldtype_sql branch from b8537c0 to 7072f3d Compare December 3, 2014 04:18

jorisvandenbossche reviewed Dec 3, 2014
View reviewed changes

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 3, 2014

jorisvandenbossche added this to the 0.15.2 milestone Dec 3, 2014

jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 3, 2014

artemyk force-pushed the notnulldtype_sql branch 2 times, most recently from ee4f172 to 944b28b Compare December 7, 2014 19:27

jorisvandenbossche mentioned this pull request Dec 7, 2014

DOC: expand docs on sql type conversion #9038

Merged

ENH: Infer dtype from non-nulls when pushing to SQL

ffc5097

Minor cleanup Minor rename Simplifying Code review fixes Complex numbers Release note Release note

artemyk force-pushed the notnulldtype_sql branch from 944b28b to ffc5097 Compare December 8, 2014 01:26

jorisvandenbossche added a commit that referenced this pull request Dec 8, 2014

Merge pull request #8973 from artemyk/notnulldtype_sql

67ec0a8

ENH: Infer dtype from non-nulls when pushing to SQL

jorisvandenbossche merged commit 67ec0a8 into pandas-dev:master Dec 8, 2014

Uh oh!

ENH: Infer dtype from non-nulls when pushing to SQL #8973

ENH: Infer dtype from non-nulls when pushing to SQL #8973

Uh oh!

Conversation

artemyk commented Dec 3, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

artemyk commented Dec 3, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

jreback commented Dec 3, 2014

Uh oh!

artemyk commented Dec 3, 2014

Uh oh!

jorisvandenbossche Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Dec 3, 2014

Uh oh!

artemyk commented Dec 3, 2014

Uh oh!

artemyk commented Dec 3, 2014

Uh oh!

jreback commented Dec 6, 2014

Uh oh!

jorisvandenbossche commented Dec 7, 2014

Uh oh!

artemyk commented Dec 7, 2014

Uh oh!

jreback commented Dec 7, 2014

Uh oh!

jorisvandenbossche commented Dec 7, 2014

Uh oh!

jorisvandenbossche commented Dec 7, 2014

Uh oh!

artemyk commented Dec 8, 2014

Uh oh!

jorisvandenbossche commented Dec 8, 2014

Uh oh!

Uh oh!