Skip to content

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 5, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

As @Samyak2 suggested on https://github.com/apache/arrow-rs/pull/8021/files#r2249926579, having the ability to convert FROM a typed value to a VariantArray will be important

For example, in SQL it could be used to cast columns to variant like in some_column::variant

What changes are included in this PR?

  1. Add cast_to_variant kernel to cast native types to VariantArray
  2. Tests

Are these changes tested?

yes

Are there any user-facing changes?

New kernel

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 5, 2025
Copy link
Contributor

@Samyak2 Samyak2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

Ok(builder.build())
}

// TODO add cast_with_options that allow specifying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this will go in a separate issue/PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this, and since there is (currently) no way for a conversion from Variant --> Array to fail, that means the cast options are less obviously relevant (and thus I can't file a ticket as I don't know what it should do)

I updated the comment to reflect this

primtive_conversion!(Float64Type, input, builder);
}
dt => {
return Err(ArrowError::CastError(format!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of follow on tickets we can perhaps file as follow on work:

  1. Support for timestamps
  2. Support for string / binary arrays
  3. Support for various timestamps types

Ok(builder.build())
}

// TODO add cast_with_options that allow specifying
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is probably a good idea -- I can file tickets for it tomorrow hopefully

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running out of time for today, so will flush what little I have so far.

Tomorrow is another day.


/// Convert the input array of a specific primitive type to a `VariantArray`
/// row by row
macro_rules! primtive_conversion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
macro_rules! primtive_conversion {
macro_rules! primitive_conversion {

Comment on lines 68 to 100
let input_type = input.data_type();
// todo: use `downcast_primitive` to avoid the boilerplate and match more types
match input_type {
DataType::Int8 => {
primtive_conversion!(Int8Type, input, builder);
}
DataType::Int16 => {
primtive_conversion!(Int16Type, input, builder);
}
DataType::Int32 => {
primtive_conversion!(Int32Type, input, builder);
}
DataType::Int64 => {
primtive_conversion!(Int64Type, input, builder);
}
DataType::UInt8 => {
primtive_conversion!(UInt8Type, input, builder);
}
DataType::UInt16 => {
primtive_conversion!(UInt16Type, input, builder);
}
DataType::UInt32 => {
primtive_conversion!(UInt32Type, input, builder);
}
DataType::UInt64 => {
primtive_conversion!(UInt64Type, input, builder);
}
DataType::Float32 => {
primtive_conversion!(Float32Type, input, builder);
}
DataType::Float64 => {
primtive_conversion!(Float64Type, input, builder);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what downcast_primitive is, but if we used downcast_primitive_array! macro, I think the code simplifies to just:

Suggested change
let input_type = input.data_type();
// todo: use `downcast_primitive` to avoid the boilerplate and match more types
match input_type {
DataType::Int8 => {
primtive_conversion!(Int8Type, input, builder);
}
DataType::Int16 => {
primtive_conversion!(Int16Type, input, builder);
}
DataType::Int32 => {
primtive_conversion!(Int32Type, input, builder);
}
DataType::Int64 => {
primtive_conversion!(Int64Type, input, builder);
}
DataType::UInt8 => {
primtive_conversion!(UInt8Type, input, builder);
}
DataType::UInt16 => {
primtive_conversion!(UInt16Type, input, builder);
}
DataType::UInt32 => {
primtive_conversion!(UInt32Type, input, builder);
}
DataType::UInt64 => {
primtive_conversion!(UInt64Type, input, builder);
}
DataType::Float32 => {
primtive_conversion!(Float32Type, input, builder);
}
DataType::Float64 => {
primtive_conversion!(Float64Type, input, builder);
}
downcast_primitive_array! {
input => input.iter().for_each(|v| match v {
Some(value) => builder.append_variant(Variant::from(value)),
None => builder.append_null(),
}),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it does. However, when I make this change, I get the following errorrs (because we need to implement conversion to/from the other types (like Decimal128 --> i128)

Screenshot 2025-08-06 at 6 42 20 AM

I think we may have to handle those cases specially. I will file a follow on ticket to do so.

error[E0277]: the trait bound `parquet_variant::Variant<'_, '_>: std::convert::From<half::binary16::f16>` is not satisfied
  --> parquet-variant-compute/src/cast_to_variant.rs:72:52
   |
72 |                 Some(value) => builder.append_variant(Variant::from(value)),
   |                                                       ^^^^^^^ the trait `std::convert::From<half::binary16::f16>` is not implemented for `parquet_variant::Variant<'_, '_>`
   |
   = help: the following other types implement trait `std::convert::From<T>`:
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<&[u8]>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<&str>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<()>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<bool>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::datetime::DateTime<chrono::offset::utc::Utc>>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::date::NaiveDate>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<chrono::naive::datetime::NaiveDateTime>`
             `parquet_variant::Variant<'_, '_>` implements `std::convert::From<f32>`
           and 12 others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense.

@alamb
Copy link
Contributor Author

alamb commented Aug 6, 2025

I am going to merge this PR in as @Samyak2 has approved it and I want to file follow on tickets that refer to it. While going through the list it turns out there were quite a few!

@alamb
Copy link
Contributor Author

alamb commented Aug 6, 2025

@scovich and anyone else, I am happy to make follow on PRs to address anything else that might come up with review of this one.

@alamb alamb merged commit c25c5a7 into apache:main Aug 6, 2025
12 checks passed
@alamb alamb deleted the alamb/cast_to_variant branch August 6, 2025 11:12
@alamb
Copy link
Contributor Author

alamb commented Aug 6, 2025

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple nits/thoughts to consider, but nothing significant.

if let Ok(value) = i8::try_from(value) {
Variant::Int8(value)
} else {
Variant::Int16(value as i16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value.into() makes clear that the conversion is infallible?

(more below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1166 to +1171
if let Ok(value) = i16::try_from(value) {
Variant::Int16(value)
} else {
Variant::Int32(value as i32)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also consider doing:

i16::try_from(value).map_or_else(|| Variant::Int32(value.into()), Variant::Int16)

tho it's a bit up in the air whether the compactness/readability trade-off is a net win

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is a tossup -- I personally prefer the if let Ok(...) syntax

Thank you for the comment

alamb added a commit that referenced this pull request Aug 6, 2025
# Which issue does this PR close?

- Follow on to  #8044

# Rationale for this change

@scovich had a good suggestion here
https://github.com/apache/arrow-rs/pull/8044/files#r2257256848:
> value.into() makes clear that the conversion is infallible?


# What changes are included in this PR?

Use `From` impl to make it clear the conversion is infallible and can
not lose precision

# Are these changes tested?

Covered by existing tests

# Are there any user-facing changes?

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] Implement cast_to_variant kernel
3 participants