Skip to content

[Python] pyarrow 13.0.0 converted datetime64[ns] to datetime64[us] when using pd.read_parquet #38171

@seanslma

Description

@seanslma

Describe the bug, including details regarding any error messages, version, and platform.

I converted a pandas df (two versions: one with datetime64[ns] and another with datetime64[us]) to parquet bytes using both pyarrow 12.0.0 and 13.0.0.

I then converted back the parquet bytes to pandas df, the datetime unit of the original df has been changed. Here is the summary of the results after running the test with both pyarrow 12.0.0 and 13.0.0

                  input       output_v12      output_v13   comment
df_parquet_bytes_v12_ns:  datetime64[ns]  datetime64[us]   v13 ns -> us, lost resolution
df_parquet_bytes_v12_us:  datetime64[ns]  datetime64[us]   v12 us -> ns, acceptable
df_parquet_bytes_v13_ns:  datetime64[ns]  datetime64[ns]   all match, no issues
df_parquet_bytes_v13_us:  datetime64[ns]  datetime64[us]   v12 us -> ns, acceptable

The change in pyarrow 13.0.0 leads to unacceptable result in case 1 (first line).

Here is the code to reproduce the issue

df_parquet_bytes_v12_us = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[us]", "metadata": null}], "creator": {"library": "pyarrow", "version": "12.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbdXNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTIuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAACAA==\x00\x18 parquet-cpp-arrow version 12.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'

df_parquet_bytes_v12_ns = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}], "creator": {"library": "pyarrow", "version": "12.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbbnNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTIuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAADAA==\x00\x18 parquet-cpp-arrow version 12.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'

df_parquet_bytes_v13_us = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[us]", "metadata": null}], "creator": {"library": "pyarrow", "version": "13.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbdXNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTMuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAACAA==\x00\x18 parquet-cpp-arrow version 13.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'

df_parquet_bytes_v13_ns = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00\x00\xbf\xc1I\x9d\x80\x17\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02dsl\x8c\x12\x1c<\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}], "creator": {"library": "pyarrow", "version": "13.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbbnNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTMuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAADAA==\x00\x18 parquet-cpp-arrow version 13.0.0\x19\x1c\x1c\x00\x00\x00\xc6\x05\x00\x00PAR1'

for v in [12,13]:
    for s in ['ns', 'us']:
        print(f'df_parquet_bytes_v{v}_{s}: ', pd.read_parquet(io.BytesIO(globals()[f'df_parquet_bytes_v{v}_{s}'])).dtypes.iloc[0])

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions