-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
I converted a pandas df (two versions: one with datetime64[ns] and another with datetime64[us]) to parquet bytes using both pyarrow 12.0.0 and 13.0.0.
I then converted back the parquet bytes to pandas df, the datetime unit of the original df has been changed. Here is the summary of the results after running the test with both pyarrow 12.0.0 and 13.0.0
input output_v12 output_v13 comment
df_parquet_bytes_v12_ns: datetime64[ns] datetime64[us] v13 ns -> us, lost resolution
df_parquet_bytes_v12_us: datetime64[ns] datetime64[us] v12 us -> ns, acceptable
df_parquet_bytes_v13_ns: datetime64[ns] datetime64[ns] all match, no issues
df_parquet_bytes_v13_us: datetime64[ns] datetime64[us] v12 us -> ns, acceptable
The change in pyarrow 13.0.0 leads to unacceptable result in case 1 (first line).
Here is the code to reproduce the issue
df_parquet_bytes_v12_us = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[us]", "metadata": null}], "creator": {"library": "pyarrow", "version": "12.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbdXNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTIuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAACAA==\x00\x18 parquet-cpp-arrow version 12.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'
df_parquet_bytes_v12_ns = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x10\x00\x06\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}], "creator": {"library": "pyarrow", "version": "12.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbbnNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTIuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAADAA==\x00\x18 parquet-cpp-arrow version 12.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'
df_parquet_bytes_v13_us = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00`}\xd7@\x04\x06\x00\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02ds%\x14L\x8c\x12\x1c,\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x16\x00(\x08\x00`}\xd7@\x04\x06\x00\x18\x08\x00`}\xd7@\x04\x06\x00\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[us]", "metadata": null}], "creator": {"library": "pyarrow", "version": "13.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbdXNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTMuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAACAA==\x00\x18 parquet-cpp-arrow version 13.0.0\x19\x1c\x1c\x00\x00\x00\xc8\x05\x00\x00PAR1'
df_parquet_bytes_v13_ns = b'PAR1\x15\x04\x15\x10\x15\x14L\x15\x02\x15\x00\x12\x00\x00\x08\x1c\x00\x00\xbf\xc1I\x9d\x80\x17\x15\x00\x15\x12\x15\x16,\x15\x02\x15\x10\x15\x06\x15\x06\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x00\x00\t \x02\x00\x00\x00\x02\x01\x01\x02\x00&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x15\x04\x19,5\x00\x18\x06schema\x15\x02\x00\x15\x04%\x02\x18\x02dsl\x8c\x12\x1c<\x00\x00\x00\x00\x00\x16\x02\x19\x1c\x19\x1c&\xc8\x01\x1c\x15\x04\x195\x00\x06\x10\x19\x18\x02ds\x15\x02\x16\x02\x16\xb8\x01\x16\xc0\x01&8&\x08\x1c\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x16\x00(\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x18\x08\x00\x00\xbf\xc1I\x9d\x80\x17\x00\x19,\x15\x04\x15\x00\x15\x02\x00\x15\x00\x15\x10\x15\x02\x00\x00\x00\x16\xb8\x01\x16\x02&\x08\x16\xc0\x01\x14\x00\x00\x19,\x18\x06pandas\x18\xb4\x03{"index_columns": [{"kind": "range", "name": null, "start": 0, "stop": 1, "step": 1}], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "ds", "field_name": "ds", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}], "creator": {"library": "pyarrow", "version": "13.0.0"}, "pandas_version": "2.1.0"}\x00\x18\x0cARROW:schema\x18\xb8\x06/////2ACAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAAAKAAwAAAAEAAgACgAAAOwBAAAEAAAAAQAAAAwAAAAIAAwABAAIAAgAAAAIAAAAEAAAAAYAAABwYW5kYXMAALQBAAB7ImluZGV4X2NvbHVtbnMiOiBbeyJraW5kIjogInJhbmdlIiwgIm5hbWUiOiBudWxsLCAic3RhcnQiOiAwLCAic3RvcCI6IDEsICJzdGVwIjogMX1dLCAiY29sdW1uX2luZGV4ZXMiOiBbeyJuYW1lIjogbnVsbCwgImZpZWxkX25hbWUiOiBudWxsLCAicGFuZGFzX3R5cGUiOiAidW5pY29kZSIsICJudW1weV90eXBlIjogIm9iamVjdCIsICJtZXRhZGF0YSI6IHsiZW5jb2RpbmciOiAiVVRGLTgifX1dLCAiY29sdW1ucyI6IFt7Im5hbWUiOiAiZHMiLCAiZmllbGRfbmFtZSI6ICJkcyIsICJwYW5kYXNfdHlwZSI6ICJkYXRldGltZSIsICJudW1weV90eXBlIjogImRhdGV0aW1lNjRbbnNdIiwgIm1ldGFkYXRhIjogbnVsbH1dLCAiY3JlYXRvciI6IHsibGlicmFyeSI6ICJweWFycm93IiwgInZlcnNpb24iOiAiMTMuMC4wIn0sICJwYW5kYXNfdmVyc2lvbiI6ICIyLjEuMCJ9AAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEKEAAAABwAAAAEAAAAAAAAAAIAAABkcwAAAAAGAAgABgAGAAAAAAADAA==\x00\x18 parquet-cpp-arrow version 13.0.0\x19\x1c\x1c\x00\x00\x00\xc6\x05\x00\x00PAR1'
for v in [12,13]:
for s in ['ns', 'us']:
print(f'df_parquet_bytes_v{v}_{s}: ', pd.read_parquet(io.BytesIO(globals()[f'df_parquet_bytes_v{v}_{s}'])).dtypes.iloc[0])
Component(s)
Python