-
Notifications
You must be signed in to change notification settings - Fork 1k
implement cast_to_variant
kernel to cast native types to VariantArray
#8044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice!
Ok(builder.build()) | ||
} | ||
|
||
// TODO add cast_with_options that allow specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this will go in a separate issue/PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought more about this, and since there is (currently) no way for a conversion from Variant --> Array to fail, that means the cast options are less obviously relevant (and thus I can't file a ticket as I don't know what it should do)
I updated the comment to reflect this
primtive_conversion!(Float64Type, input, builder); | ||
} | ||
dt => { | ||
return Err(ArrowError::CastError(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of follow on tickets we can perhaps file as follow on work:
- Support for timestamps
- Support for string / binary arrays
- Support for various timestamps types
Ok(builder.build()) | ||
} | ||
|
||
// TODO add cast_with_options that allow specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running out of time for today, so will flush what little I have so far.
Tomorrow is another day.
|
||
/// Convert the input array of a specific primitive type to a `VariantArray` | ||
/// row by row | ||
macro_rules! primtive_conversion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macro_rules! primtive_conversion { | |
macro_rules! primitive_conversion { |
let input_type = input.data_type(); | ||
// todo: use `downcast_primitive` to avoid the boilerplate and match more types | ||
match input_type { | ||
DataType::Int8 => { | ||
primtive_conversion!(Int8Type, input, builder); | ||
} | ||
DataType::Int16 => { | ||
primtive_conversion!(Int16Type, input, builder); | ||
} | ||
DataType::Int32 => { | ||
primtive_conversion!(Int32Type, input, builder); | ||
} | ||
DataType::Int64 => { | ||
primtive_conversion!(Int64Type, input, builder); | ||
} | ||
DataType::UInt8 => { | ||
primtive_conversion!(UInt8Type, input, builder); | ||
} | ||
DataType::UInt16 => { | ||
primtive_conversion!(UInt16Type, input, builder); | ||
} | ||
DataType::UInt32 => { | ||
primtive_conversion!(UInt32Type, input, builder); | ||
} | ||
DataType::UInt64 => { | ||
primtive_conversion!(UInt64Type, input, builder); | ||
} | ||
DataType::Float32 => { | ||
primtive_conversion!(Float32Type, input, builder); | ||
} | ||
DataType::Float64 => { | ||
primtive_conversion!(Float64Type, input, builder); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what downcast_primitive
is, but if we used downcast_primitive_array!
macro, I think the code simplifies to just:
let input_type = input.data_type(); | |
// todo: use `downcast_primitive` to avoid the boilerplate and match more types | |
match input_type { | |
DataType::Int8 => { | |
primtive_conversion!(Int8Type, input, builder); | |
} | |
DataType::Int16 => { | |
primtive_conversion!(Int16Type, input, builder); | |
} | |
DataType::Int32 => { | |
primtive_conversion!(Int32Type, input, builder); | |
} | |
DataType::Int64 => { | |
primtive_conversion!(Int64Type, input, builder); | |
} | |
DataType::UInt8 => { | |
primtive_conversion!(UInt8Type, input, builder); | |
} | |
DataType::UInt16 => { | |
primtive_conversion!(UInt16Type, input, builder); | |
} | |
DataType::UInt32 => { | |
primtive_conversion!(UInt32Type, input, builder); | |
} | |
DataType::UInt64 => { | |
primtive_conversion!(UInt64Type, input, builder); | |
} | |
DataType::Float32 => { | |
primtive_conversion!(Float32Type, input, builder); | |
} | |
DataType::Float64 => { | |
primtive_conversion!(Float64Type, input, builder); | |
} | |
downcast_primitive_array! { | |
input => input.iter().for_each(|v| match v { | |
Some(value) => builder.append_variant(Variant::from(value)), | |
None => builder.append_null(), | |
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does. However, when I make this change, I get the following errorrs (because we need to implement conversion to/from the other types (like Decimal128 --> i128)

I think we may have to handle those cases specially. I will file a follow on ticket to do so.
error[E0277]: the trait bound `parquet_variant::Variant<'_, '_>: std::convert::From<half::binary16::f16>` is not satisfied
--> parquet-variant-compute/src/cast_to_variant.rs:72:52
|
72 | Some(value) => builder.append_variant(Variant::from(value)),
| ^^^^^^^ the trait `std::convert::From<half::binary16::f16>` is not implemented for `parquet_variant::Variant<'_, '_>`
|
= help: the following other types implement trait `std::convert::From<T>`:
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<&[u8]>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<&str>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<()>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<bool>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::datetime::DateTime<chrono::offset::utc::Utc>>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::date::NaiveDate>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::datetime::NaiveDateTime>`
`parquet_variant::Variant<'_, '_>` implements `std::convert::From<f32>`
and 12 others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense.
I am going to merge this PR in as @Samyak2 has approved it and I want to file follow on tickets that refer to it. While going through the list it turns out there were quite a few! |
@scovich and anyone else, I am happy to make follow on PRs to address anything else that might come up with review of this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Couple nits/thoughts to consider, but nothing significant.
if let Ok(value) = i8::try_from(value) { | ||
Variant::Int8(value) | ||
} else { | ||
Variant::Int16(value as i16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value.into()
makes clear that the conversion is infallible?
(more below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a good idea -- done in
if let Ok(value) = i16::try_from(value) { | ||
Variant::Int16(value) | ||
} else { | ||
Variant::Int32(value as i32) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also consider doing:
i16::try_from(value).map_or_else(|| Variant::Int32(value.into()), Variant::Int16)
tho it's a bit up in the air whether the compactness/readability trade-off is a net win
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is a tossup -- I personally prefer the if let Ok(...)
syntax
Thank you for the comment
# Which issue does this PR close? - Follow on to #8044 # Rationale for this change @scovich had a good suggestion here https://github.com/apache/arrow-rs/pull/8044/files#r2257256848: > value.into() makes clear that the conversion is infallible? # What changes are included in this PR? Use `From` impl to make it clear the conversion is infallible and can not lose precision # Are these changes tested? Covered by existing tests # Are there any user-facing changes? No
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
cast_to_variant
kernel #8043Rationale for this change
As @Samyak2 suggested on https://github.com/apache/arrow-rs/pull/8021/files#r2249926579, having the ability to convert FROM a typed value to a VariantArray will be important
For example, in SQL it could be used to cast columns to variant like in
some_column::variant
What changes are included in this PR?
cast_to_variant
kernel to cast native types toVariantArray
Are these changes tested?
yes
Are there any user-facing changes?
New kernel