You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-44486][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas
### What changes were proposed in this pull request?
Implement Arrow `self_destruct` of `toPandas` for memory savings.
Now the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame.
### Why are the changes needed?
Reach parity with vanilla PySpark. The PR is a mirror of #29818 for Spark Connect.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#42079 from xinrong-meng/self_destruct.
Authored-by: Xinrong Meng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
0 commit comments