-
-
Notifications
You must be signed in to change notification settings - Fork 818
Closed
Labels
Description
This is kind of an edge case but the error message makes it somewhat difficult to identify the underlying issue.
If you have a Pandas DataFrame where there are duplicated column names and they are not integers, you'd get an exception when trying to plot something. MWE:
import io
df = pd.read_csv(io.StringIO("""
a, b, c, d
0, 1, 2, 2022-01-01
2, 3, 4, 2022-01-01
"""))
df.columns = ['a', 'b', 'c', 'c']
alt.Chart(df).mark_point().encode(x='a', y='b')
results in
TypeError: to_list_if_array() got an unexpected keyword argument 'convert_dtype'
Note that
- the duplicated columns are not used in plotting.
- if both duplicated columns are of type integer, then you would just get a warning. But with most other types (including floats) it would generate an exception.
- besides explicit renaming the columns like this, another scenario where you'd accidentally generate duplicated column names is calling
toPandas()
after join two PySpark DataFrames.
Using altair 4.2.0