Skip to content

Conversation

artemyk
Copy link
Contributor

@artemyk artemyk commented Dec 3, 2014

Closes #8778

This infers dtype from non-null values for insertion into SQL database. See #8778 . I had to alter @tiagoantao test a bit.

Like @tiagoantao, I skipped writing tests for legacy MySQL. Support for this will be removed soon, right?

As a side note, lib.infer_dtype throws an exception for categorical data -- probably shouldn't happen, right?

@jorisvandenbossche
@jahfet
@tiagoantao

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

can u show what u r passing that raises on Categorical
infer_dtypes is internal and cannot be called with certain types of things
but sounds like a bug

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

Yes, here's the snippet. Interestingly, it works for a dataframe but not for a series.

In [11]: import pandas as pd

In [12]: import pandas.lib as lib

In [13]: df = pd.DataFrame({'a':pd.Series(['A','B'], dtype='category')})

In [14]: lib.infer_dtype(df)
Out[14]: 'string'

In [15]: lib.infer_dtype(df['a'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-20fdf746850b> in <module>()
----> 1 lib.infer_dtype(df['a'])

/Users/artemy/dev/pandas/pandas/lib.so in pandas.lib.infer_dtype (pandas/lib.c:41462)()

TypeError: Cannot convert Categorical to numpy.ndarray

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

that's not really valid input, it has to be an ndarray (though we have relaxed it a bit). But I will add this as it should be able to infer (its easy to do), it just checks the dtype (and doesn't do anything more)

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

@artemyk ok, fixed up by #8975

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jreback Great, thanks! Simplified this PR a bit (no longer doing special check for categorical type)

@@ -884,37 +884,49 @@ def _harmonize_columns(self, parse_dates=None):
except KeyError:
pass # this column not in results

def _get_notnull_col_dtype(self, col):
col_for_inference = col
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring to explain why this is needed?

@jorisvandenbossche
Copy link
Member

@artemyk put some comments, but in general it looks very good!
ok for skipping the tests on mysql legacy

@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jorisvandenbossche Thanks, made the fixes.

Regarding lib.infer_dtype(com._ensure_object(col)) --- this is from #6932 , not sure why _ensure_object was used , I thought that code path would occur when col is already dtype object (@jreback may know) . In any case, the new dtype inference code takes care of inferring date and time.

And yes, I removed the previous testing for dtype, lines like elif issubclass(pytype, np.integer): &c., and now rely on lib.infer_dtype to do this. Not sure what you mean by should we also do that for sqlalchemy + test it.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 3, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.15.2 milestone Dec 3, 2014
@artemyk
Copy link
Contributor Author

artemyk commented Dec 3, 2014

@jorisvandenbossche Oh, that makes sense! Added check / test for complex numbers .

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 3, 2014
@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

@jorisvandenbosscher ready to go?

@jorisvandenbossche
Copy link
Member

yep, looking good! @artemyk can you add a release note and squash?

@artemyk artemyk force-pushed the notnulldtype_sql branch 2 times, most recently from ee4f172 to 944b28b Compare December 7, 2014 19:27
@artemyk
Copy link
Contributor Author

artemyk commented Dec 7, 2014

@jorisvandenbossche @jreback Done

@jreback
Copy link
Contributor

jreback commented Dec 7, 2014

looks ok to me, any docs need to be updated for this? (maybe a note somewhere explaining what this is doing), e.g. maybe add a 'dtypes' sub-section for SQL (as getting more traction in this area).

Could be done in separate issue.

@jorisvandenbossche
Copy link
Member

@artemyk hmm, can you rebase again?

Docs can go in a seperate PR I think (more general than this)

@jorisvandenbossche
Copy link
Member

@jreback did a PR for your doc suggestion

Minor cleanup

Minor rename

Simplifying

Code review fixes

Complex numbers

Release note

Release note
@artemyk
Copy link
Contributor Author

artemyk commented Dec 8, 2014

@jorisvandenbossche Rebased on master, let me if I still need to fix something. And thanks for doing the docs!

jorisvandenbossche added a commit that referenced this pull request Dec 8, 2014
ENH: Infer dtype from non-nulls when pushing to SQL
@jorisvandenbossche jorisvandenbossche merged commit 67ec0a8 into pandas-dev:master Dec 8, 2014
@jorisvandenbossche
Copy link
Member

@artemyk Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

problem with to_sql with NA
3 participants