-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic #46155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ae50ce3
to
0bd565b
Compare
5fa46be
to
a6677c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of pyspark.sql.classic
?
Let me run this naming stuff separately (maybe sending an email to the dev mailing list) because Spark Classic has been used in other places lately in any event. |
beddbbf
to
7dde1e8
Compare
Sure, please proceed it. We can discuss the naming separately, @HyukjinKwon . |
@@ -15,7 +15,7 @@ | |||
# limitations under the License. | |||
# | |||
from pyspark.ml import functions as PyMLFunctions | |||
from pyspark.sql.connect.column import Column | |||
from pyspark.sql.column import Column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should keep it as-is to be consistent with the classic functions.py
if it changes to refer classic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually related to the type hints. We're using pyspark.sql.Column
for type hints, and when we actually need to create the instance, we should use pyspark.sql.classic.column import Column
or pyspark.sql.connect.column import Column
.. it is also a bit messy. I should probably clean those up separately.
@@ -22,7 +22,8 @@ | |||
from pyspark.ml.common import _java2py, _py2java | |||
from pyspark.ml.linalg import Matrix, Vector | |||
from pyspark.ml.wrapper import JavaWrapper, _jvm | |||
from pyspark.sql.column import Column, _to_seq | |||
from pyspark.sql.column import Column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this consistent with functions.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or import _to_seq
later as the other files do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is only used in Spark Classic so it should probably be fine. It is a bit messy and complicated .. I will try to refactor them separately.
Merged to master. For naming, I will send an email soon. |
…t `_with_origin` ### What changes were proposed in this pull request? This PR is a followup of #46155 that adds check of `__name__` at `_with_origin`. ### Why are the changes needed? It is possible for a callable instance without __name__ attribute or/and __module__ attribute to be wrapped. For example, functools.partial. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `./bin/pyspark` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46198 from HyukjinKwon/SPARK-47933-followup2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…n `pyspark-connect` ### What changes were proposed in this pull request? This PR is a followup of #46155 that removes the reference of `_to_seq` that `pyspark-connect` package does not have. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8821919392/job/24218893631 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46229 from HyukjinKwon/SPARK-47933-followuptmp. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…` reference in `pyspark.ml.stat` ### What changes were proposed in this pull request? This PR is a followup of #46155 that removes the reference of `_to_seq` that `pyspark-connect` package does not have. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8861971303 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46262 from HyukjinKwon/SPARK-47933-followup4. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
… classes ### What changes were proposed in this pull request? This PR proposes to fix `pyspark-connect` build that is broken by #46155. It is similar with the followups such as #46229 and #46262 but make sure the tests pass within this PR. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8884003967 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released yet. ### How was this patch tested? CI in my own fork: https://github.com/HyukjinKwon/spark/actions/runs/8889698252/job/24408487227 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46294 from HyukjinKwon/SPARK-48052. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
… Classic ### What changes were proposed in this pull request? Same as apache#46129 but for `Column` class. ### Why are the changes needed? Same as apache#46129 ### Does this PR introduce _any_ user-facing change? Same as apache#46129 ### How was this patch tested? Manually tested, and CI should verify them. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46155 from HyukjinKwon/SPARK-47933. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…t `_with_origin` ### What changes were proposed in this pull request? This PR is a followup of apache#46155 that adds check of `__name__` at `_with_origin`. ### Why are the changes needed? It is possible for a callable instance without __name__ attribute or/and __module__ attribute to be wrapped. For example, functools.partial. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `./bin/pyspark` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46198 from HyukjinKwon/SPARK-47933-followup2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…n `pyspark-connect` ### What changes were proposed in this pull request? This PR is a followup of apache#46155 that removes the reference of `_to_seq` that `pyspark-connect` package does not have. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8821919392/job/24218893631 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46229 from HyukjinKwon/SPARK-47933-followuptmp. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…` reference in `pyspark.ml.stat` ### What changes were proposed in this pull request? This PR is a followup of apache#46155 that removes the reference of `_to_seq` that `pyspark-connect` package does not have. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8861971303 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released out yet. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46262 from HyukjinKwon/SPARK-47933-followup4. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
… classes ### What changes were proposed in this pull request? This PR proposes to fix `pyspark-connect` build that is broken by apache#46155. It is similar with the followups such as apache#46229 and apache#46262 but make sure the tests pass within this PR. ### Why are the changes needed? To recover the CI https://github.com/apache/spark/actions/runs/8884003967 ### Does this PR introduce _any_ user-facing change? No, the main change has not been released yet. ### How was this patch tested? CI in my own fork: https://github.com/HyukjinKwon/spark/actions/runs/8889698252/job/24408487227 ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46294 from HyukjinKwon/SPARK-48052. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
Same as #46129 but for
Column
class.Why are the changes needed?
Same as #46129
Does this PR introduce any user-facing change?
Same as #46129
How was this patch tested?
Manually tested, and CI should verify them.
Was this patch authored or co-authored using generative AI tooling?
No.