Skip to content

TypeError if DataFrame contains duplicated column name (in some cases) #2718

@saiwing-yeung

Description

@saiwing-yeung

This is kind of an edge case but the error message makes it somewhat difficult to identify the underlying issue.
If you have a Pandas DataFrame where there are duplicated column names and they are not integers, you'd get an exception when trying to plot something. MWE:

import io
df = pd.read_csv(io.StringIO("""
a, b, c, d
0, 1, 2, 2022-01-01
2, 3, 4, 2022-01-01
"""))
df.columns = ['a', 'b', 'c', 'c']
alt.Chart(df).mark_point().encode(x='a', y='b')

results in

TypeError: to_list_if_array() got an unexpected keyword argument 'convert_dtype'

Note that

  • the duplicated columns are not used in plotting.
  • if both duplicated columns are of type integer, then you would just get a warning. But with most other types (including floats) it would generate an exception.
  • besides explicit renaming the columns like this, another scenario where you'd accidentally generate duplicated column names is calling toPandas() after join two PySpark DataFrames.

Using altair 4.2.0

Tracking

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions