Description
What should be the API for working with pandas, pyarrow, and dataclasses and/or pydantic?
-
Pandas 2.0 supports pyarrow for so many things now, and pydantic does data validation with a drop-in
dataclasses.dataclass
replacement atpydantic.dataclasses.dataclass
.- https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#argument-dtype-backend-to-return-pyarrow-backed-or-numpy-backed-nullable-dtypes
pd.read_*(**, dtype_backend="pyarrow")
- https://pandas.pydata.org/docs/dev/user_guide/pyarrow.html
- https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.convert_dtypes.html#pandas.DataFrame.convert_dtypes
- https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#argument-dtype-backend-to-return-pyarrow-backed-or-numpy-backed-nullable-dtypes
-
https://www.google.com/search?q=pyarrow+dataclasses
- https://github.com/freegor/be-pydantic/blob/main/main.py
- https://arrow.apache.org/docs/python/data.html
- https://arrow.apache.org/docs/python/pandas.html
- https://github.com/apache/arrow/blob/main/python/pyarrow/cffi.py
- https://github.com/apache/arrow/blob/main/python/pyarrow/dataset.py
- https://github.com/apache/arrow/blob/97821aa5af650e6478116cae7c0128fe37dad067/python/pyarrow/tests/test_pandas.py#L146 TestConvertMetadata
- https://github.com/apache/arrow/blob/main/python/pyarrow/tests/test_schema.py#L221 test_schema*()
-
https://www.google.com/search?q=pydantic+dataclasses
- https://docs.pydantic.dev/usage/dataclasses/
https://github.com/pydantic/pydantic/blob/main/docs/usage/dataclasses.md-
If you don't want to use pydantic's BaseModel you can instead get the same data validation on standard dataclasses
-
Difference with stdlib dataclasses¶
Note that thedataclasses.dataclass
from Python stdlib implements only the__post_init__
method since it doesn't run a validation step.When substituting usage of
dataclasses.dataclass
withpydantic.dataclasses.dataclass
, it is recommended to move the code executed in the__post_init__
method to the__post_init_post_parse__
method, and only leave behind part of code which needs to be executed before validation.
https://docs.pydantic.dev/usage/dataclasses/#difference-with-stdlib-dataclasses
-
- https://github.com/pydantic/pydantic/blob/main/pydantic/dataclasses.py
- https://github.com/pydantic/pydantic/blob/main/pydantic/_internal/_dataclasses.py
- https://github.com/pydantic/pydantic/tree/main/docs/examples/ dataclasses*.py
- https://github.com/pydantic/pydantic/blob/main/tests/test_dataclasses.py
@pydantic.dataclasses.dataclass
- https://docs.pydantic.dev/usage/dataclasses/