[Python] (PySpark) Support for subclasses in type_verifier #50726
+307
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Current implementation of _type_verifier does not support classes extending the acceptable types. Here is a small test case for same that fails in current implementation:
Sample test case that fails currently
Failure logs
This is happening due to current implementation using
type(data_type)
which does not return StructType for classes extending StructType. (ref)Why are the changes needed?
proposal: Changing implementation to use
isinstance()
instead oftype()
I believe inheritance should be allowed for DataTypes as it enables users to add behavior, validations or schematic meanings to them.
Example: my use case that is failing currently
I was trying to achieve this behavior:
The current implementation only checks for behavior of a data type. By using
type
it restricts inheritance. It can achieve same by usingisinstance
too. IF inheritance is not desirable, then maybe types should be annotated with@final
. But in either cases, I would consider it to be a bug.Does this PR introduce any user-facing change?
No. This PR does not change any existing user facing behavior, but allows them to extend DataTypes if they need to.
How was this patch tested?
There are already few unit tests for
_make_type_verifier
(here) that test against the direct supported data types. Created a copy of those tests and instead of using the direct types, checked against extended datatypes.Was this patch authored or co-authored using generative AI tooling?
No.