Skip to content

Conversation

liamzwbao
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement DataType::Union for cast_to_variant

Are these changes tested?

Yes

Are there any user-facing changes?

New cast type supported

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 21, 2025
}

/// Convert dictionary encoded arrays
fn convert_dictionary_encoded(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change just moves the code from the Dictionary branch into this helper function. Since all the impls use slightly different coding styles, we could do a follow-up PR to make them consistent once all the variant cast implementations are complete.

@liamzwbao liamzwbao force-pushed the issue-8195-variant-union branch from f34bcd2 to 6dda6a2 Compare August 21, 2025 01:50
@liamzwbao liamzwbao marked this pull request as ready for review August 21, 2025 01:50

// Convert each child array to variant arrays
let mut child_variant_arrays = HashMap::new();
for (type_id, _) in fields.iter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to merge the two passes into one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are suggesting

Copy link
Contributor Author

@liamzwbao liamzwbao Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean using one loop instead of two? That way we will compute the child array of each type_id on demand.

But IIUC, we will use all the child arrays anyway if it's a valid union, and I think precomute all the child arrays may benefit the lookup in the second loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, we'll use the same child_variant_array many times. The current would have a better performance

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liamzwbao -- this looks great. The only thing I think it is missing is a test for nulls in the UnionArray.

Thank you @klion26 for the review

let value = child_variant_array.value(value_offset);
builder.append_variant(value);
} else {
// This should not happen in a valid union, but handle gracefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// Convert each child array to variant arrays
let mut child_variant_arrays = HashMap::new();
for (type_id, _) in fields.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are suggesting

}

#[test]
fn test_cast_to_variant_union_sparse() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we please add a test for a UnionArray where the child element is null? So that the output VariantArray has a null as well?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @liamzwbao

I also merged up from main to resolve a merge conflict with this PR

@alamb
Copy link
Contributor

alamb commented Aug 23, 2025

🚀

@alamb alamb merged commit 0c4e58f into apache:main Aug 23, 2025
12 checks passed
@liamzwbao liamzwbao deleted the issue-8195-variant-union branch August 28, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant]: Implement DataType::Union support for cast_to_variant kernel
3 participants