Skip to content

Conversation

liamzwbao
Copy link
Contributor

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

What changes are included in this PR?

  • Support casting VariantDecimal32/64/128/256
  • Handle scaling and precision adjustments, downscaling may lose fractional precision

Are these changes tested?

Yes

Are there any user-facing changes?

New cast types supported

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 4, 2025
@liamzwbao liamzwbao marked this pull request as ready for review October 4, 2025 17:19
@liamzwbao
Copy link
Contributor Author

cc @alamb @scovich

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @liamzwbao 🙏

let result = variant_get(&variant_array, options).unwrap();
let result = result.as_any().downcast_ref::<Decimal32Array>().unwrap();

assert_eq!(result.value(0), 124);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match the test above, it would probably be good to assert result.precision() and result.scale() as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise for the Decimal 64/128/256 cases too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an initial pass. The high-level comments are more important than the low-level syntax nits.

Comment on lines 81 to 82
// scale_down means output has fewer fractional digits than input
// divide by 10^(input_scale - output_scale) with rounding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous about rounding (the whole point of decimal is to be lossless, unlike floating point). But I guess in this case the user specifically asked for the narrower type, so the usual worries about lossy coercion don't apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rounding makes sense here as arrow variant conversion could also cause precision loss due to rescaling. But we could also introduce a new option to fail on precision loss if needed

let d = v.checked_div(div)?;
let r = v % div;

// rounding in the same way as convert_to_smaller_scale_decimal in arrow-cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at convert_to_smaller_scale_decimal, virtually all the logic in that function is doing exactly what we want here... and then the last line just applies the calculation as an appropriate unary operation on the input array. Rather than duplicate the logic, is there some way we could factor it out or otherwise reuse it? Problem is, it's in a different crate, so the factored out logic would have to be pub...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are internal helper functions, could refactor the logic, but not sure if it's good to expose that. WDYT @alamb ?

Comment on lines +214 to +215
DataType::Decimal32(precision, scale) => Decimal32(
VariantToDecimalArrowRowBuilder::new(cast_options, capacity, *precision, *scale)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I'm understanding correctly --

  • Here, the user has requested e.g. Decimal32, so we create a decimal32 row builder
  • The row builder invokes the VariantDecimalScaler trait, which eventually callsVariant::as_decimal4
  • If the actual variant value was a wider decimal type, the conversion will produce None unless the unscaled value fits in the narrower type and the scale is small enough to fit as well (without rounding)?

But in this case, the user specifically requested rounding, so it seems odd to fail some of the time and not fail other times? In particular, going from Decimal32(9, 4) to Decimal32(9, 2) would succeed with rounding, but going from Decimal64(18, 4) to Decimal32(9, 2) would fail for a value like 1234567.8901, even tho the rescaled result 1234567.89 is a valid Decimal32(9, 2)?

In order to correctly handle all valid narrowing conversions, we need to rescale+round first, using the original variant type, and then try to narrow the result to the requested type.

The converse hazard exists for widening, where we need to widen first, and then rescale+round:

  • Converting the Decimal32(9, 9) value 0.999999999 to Decimal64(*, 0) produces an intermediate value ten decimal digits.
  • Converting the Decimal(9, 0) value 999999999 to Decimal64(18, 9) produces an intermediate (and final) value with 18 digits.

Copy link
Contributor

@scovich scovich Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at all possible combinations:

  • We are converting unscaled value v1 of type Variant::DecimalXX(p1, s1) to datatypes::DecimalYY(p2, s2)
  • The variant decimals have implied precision, so p1 is always one of {9, 19, 38} based on decimal type
  • let n1 = p1-s1 and n2 = p2-s2 (the max number of integer digits before and after conversion)
  • if n2 < n1, there is an inherent risk of overflow regardless of what scales are involved
    • before even looking at scale and scale changes, we should first verify that v1 fits in n2+s1 digits. If not, flag overflow immediately. Otherwise, set n1=n2 and proceed to the next case.
    • NOTE: This check does NOT require changing the type of v1, because total precision decreased.
  • else if n2 = n1 and s2 < s1, there is an overflow risk when e.g. 0.999 rounds to 1.00
    • Rescale, and then verify that the rounded result fits in p2 digits.
    • NOTE: This check does NOT require changing the type of v1, because total precision decreased.
  • else, there is no risk of overflow
    • Convert v1 to the new native type first
    • Then rescale and round as needed
    • NOTE: Both operations are infallible

That would correspond to something like the following code:

fn variant_to_unscaled_decimal32(
    variant: Variant<'_, '_>, 
    precision: u8, 
    scale: u8,
) -> Result<i32> {
    match variant {
        Variant::Decimal4(d) => {
            let s1 = d.scale();
            let mut n1 = VariantDecimal4::MAX_PRECISION - s1;
            let n2 = precision - scale;
            let v1 = d.integer();
            if n2 < n1 {
                // integer digits pose an overflow risk, and n2+s1 could even be out of precision range
                let max_value = MAX_DECIMAL32_FOR_EACH_PRECISION.get(n2 + s1);
                if max_value.is_none_or(|n| v1.unsigned_abs() > n) {
                    return Err(... overflow ...);
                }
                // else the value fits in n2 digits and we can pretend n1=n2
                n1 = n2;
            }

            if n2 == n1 {
                let v2 = ... rescale v1 and round up ...;
                if v2.unsigned_abs() > MAX_DECIMAL32_FOR_EACH_PRECISION[precision] {
                    return Err(... overflow ...);
                }
                // else the value can safely convert to the target type
                return Ok(v2 as _);
            }

            // no overflow possible, but still have to rescale and round
            let v1 = v1 as _;
            let v2 = ... rescale v1 and round up ...;
            Ok(v2)
        }
        Variant::Decimal8(d) => {
            ... almost the same code as for Decimal4 case ...
            ... except we use VariantDecimal8::MAX_PRECISION ...
            ... and we index into MAX_DECIMAL64_FOR_EACH_PRECISION ...
        }
        Variant::Decimal16(d) => {
            ... almost the same code as for Decimal4 case ...
            ... except we use VariantDecimal16::MAX_PRECISION ...
            ... and we index into MAX_DECIMAL128_FOR_EACH_PRECISION ...
        }
        Variant::Int8(i) => { ... treat it like `Variant::Decimal4(i, 0)` ... }
        Variant::Int16(i) => { ... treat it like `Variant::Decimal4(i, 0)` ... }
        Variant::Int32(i) => { ... treat it like `Variant::Decimal8(i, 0)` ... }
        Variant::Int64(i) => { ... treat it like `Variant::Decimal16(i, 0)` ... }
        _ => return Err(... not exact numeric data ...),
    }
}

fn variant_to_unscaled_decimal64(
    variant: Variant<'_, '_>, 
    precision: u8, 
    scale: u8,
) -> Result<i64> {
    ... exactly the same code as for decimal32 case ...
    ... but the changed return type means the `as _` casts now produce i64 ...
}

fn variant_to_unscaled_decimal128(
    variant: Variant<'_, '_>, 
    precision: u8, 
    scale: u8,
) -> Result<i128> {
    ... exactly the same code as for decimal32 case ...
    ... but the changed return type means the `as _` casts now produce i128 ...
}

fn variant_to_unscaled_decimal256(
    variant: Variant<'_, '_>, 
    precision: u8, 
    scale: u8,
) -> Result<i256> {
    ... exactly the same code as for decimal32 case ...
    ... but the changed return type means the `as _` casts now produce i256 ...
}

So, we'd want two macros:

  • Outer macro that produces the body of variant_to_unscaled_decimalXX functions
  • Inner macro that produces the body of Variant::DecimalXX match arms

We need macros because integer types lack any helpful trait hierarchy that generics could take advantage of.

Update: Corrected a potential array out of bounds index in the n2 < n1 case.

Copy link
Contributor

@scovich scovich Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Watch out, arrow decimals can have negative scale. My analysis above didn't necessarily account for that; I'm not sure if the original code in this PR does?

In particular, negative scale allows infallible conversions such as VariantDecimal16(38, 0) to Decimal4(9, -30) with rounding, and the n1 vs. n2 checks I proposed above might not accurately capture this nuance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Let me dig a bit deeper and improve this conversion. Indeed it's possible to get a null for a valid decimal.

For negative scale, I think it's covered in this method, I will add more tests for it. Also, the validate function in the macro scale_variant_decimal will check and make sure it fit into a decimal with specific precision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the implementation of cast_decimal_to_decimal<I, O> in arrow-cast, and it seems to already handle our cases quite well. Specifically:

  1. It checks is_infallible_cast, which covers the case 3.
  2. For scale-up (s1 <= s2), it first converts I::Native to O::Native and then rescales. For scale-down (s1 > s2), it divides and rounds the result (I::Native) before converting to O::Native. This approach gracefully handles native-type overflow. The subsequent DecimalType::is_valid_decimal_precision call ensures precision validation, similar to our current MAX_DECIMAL32_FOR_EACH_PRECISION.get(n2 + s1) check, which effectively covers cases 1 & 2, where n2 < n1 or n2 == n1.
  3. That said, case 1 (n2 < n1) might present an optimization opportunity since we could skip rescaling. Functionally tho, the results should be the same. This could be explored in a follow-up PR.

Given this overlap, instead of duplicating logic, I plan to refactor the decimal cast function by extracting the shared core logic into a helper like and expose it, then we need a dependency on arrow-cast tho:

fn rescale_decimal<I, O>(
    integer: I::Native,
    input_precision: u8,
    input_scale: i8,
    output_precision: u8,
    output_scale: i8,
) -> Option<O::Native>
where
    I: DecimalType,
    O: DecimalType,
    I::Native: DecimalCast,
    O::Native: DecimalCast,

Then, in our case, we can simply wire the type conversions through this helper:

fn variant_to_unscaled_decimal32(
    variant: Variant<'_, '_>,
    precision: u8,
    scale: u8,
) -> Result<i32> {
    match variant {
        Variant::Decimal4(d) => rescale_decimal::<Decimal32, Decimal32>(
            d.integer(), VariantDecimal4::MAX_PRECISION, d.scale(), precision, scale),
        Variant::Decimal8(d) => rescale_decimal::<Decimal64, Decimal32>(
            d.integer(), VariantDecimal8::MAX_PRECISION, d.scale(), precision, scale),
        Variant::Decimal16(d) => rescale_decimal::<Decimal128, Decimal32>(
            d.integer(), VariantDecimal16::MAX_PRECISION, d.scale(), precision, scale),
        Variant::Int8(i) => rescale_decimal::<Decimal32, Decimal32>(
            i, VariantDecimal4::MAX_PRECISION, 0, precision, scale),
        Variant::Int16(i) => rescale_decimal::<Decimal32, Decimal32>(
            i, VariantDecimal4::MAX_PRECISION, 0, precision, scale),
        Variant::Int32(i) => rescale_decimal::<Decimal32, Decimal32>(
            i, VariantDecimal4::MAX_PRECISION, 0, precision, scale),
        Variant::Int64(i) => rescale_decimal::<Decimal64, Decimal32>(
            i, VariantDecimal8::MAX_PRECISION, 0, precision, scale),
        _ => return Err(... not exact numeric data ...),
    }
}

Let me know if you see any potential risks or edge cases I might have overlooked.

@liamzwbao liamzwbao force-pushed the issue-8477-variant-to-arrow-decimal branch from 0f7665f to b43ac66 Compare October 7, 2025 00:28
@liamzwbao liamzwbao force-pushed the issue-8477-variant-to-arrow-decimal branch from b43ac66 to 9b6d0e1 Compare October 7, 2025 01:36
Copy link
Contributor Author

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review, @scovich! Addressed most of the comments, will improve the type cast then

Comment on lines 81 to 82
// scale_down means output has fewer fractional digits than input
// divide by 10^(input_scale - output_scale) with rounding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rounding makes sense here as arrow variant conversion could also cause precision loss due to rescaling. But we could also introduce a new option to fail on precision loss if needed

let d = v.checked_div(div)?;
let r = v % div;

// rounding in the same way as convert_to_smaller_scale_decimal in arrow-cast
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are internal helper functions, could refactor the logic, but not sure if it's good to expose that. WDYT @alamb ?

Comment on lines +214 to +215
DataType::Decimal32(precision, scale) => Decimal32(
VariantToDecimalArrowRowBuilder::new(cast_options, capacity, *precision, *scale)?,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Let me dig a bit deeper and improve this conversion. Indeed it's possible to get a null for a valid decimal.

For negative scale, I think it's covered in this method, I will add more tests for it. Also, the validate function in the macro scale_variant_decimal will check and make sure it fit into a decimal with specific precision

@alamb
Copy link
Contributor

alamb commented Oct 7, 2025

I think this may be related to #8562 🤔

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 8, 2025
@liamzwbao liamzwbao force-pushed the issue-8477-variant-to-arrow-decimal branch from 5dcb456 to a7cdd33 Compare October 8, 2025 19:36
@liamzwbao
Copy link
Contributor Author

Hi @scovich @alamb, this is ready for another round of review. To make what I mentioned here happen, I refactor the code in arrow-cast, so I think we may need a benchmark for the change to ensure it doesn't cause regression.

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-8477-variant-to-arrow-decimal (94d60c0) to 760b7b6 diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-8477-variant-to-arrow-decimal
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖: Benchmark completed

Details

group                                                                issue-8477-variant-to-arrow-decimal    main
-----                                                                -----------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.07     25.4±0.11ms        ? ?/sec    1.00     23.8±0.10ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.04    313.9±5.64ms        ? ?/sec    1.00    303.1±0.79ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00      7.3±0.02ms        ? ?/sec    1.05      7.6±0.04ms        ? ?/sec
variant_get_primitive                                                1.00    923.5±1.42ns        ? ?/sec    1.00    922.9±1.78ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-8477-variant-to-arrow-decimal (94d60c0) to 760b7b6 diff
BENCH_NAME=variant_builder
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_builder
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-8477-variant-to-arrow-decimal
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖: Benchmark completed

Details

group                                       issue-8477-variant-to-arrow-decimal    main
-----                                       -----------------------------------    ----
bench_extend_metadata_builder               1.05     66.4±2.98ms        ? ?/sec    1.00     63.0±2.59ms        ? ?/sec
bench_object_field_names_reverse_order      1.00     19.6±0.97ms        ? ?/sec    1.09     21.3±0.55ms        ? ?/sec
bench_object_list_partially_same_schema     1.00  1276.7±15.32µs        ? ?/sec    1.00  1274.9±27.16µs        ? ?/sec
bench_object_list_same_schema               1.00     24.7±0.19ms        ? ?/sec    1.02     25.3±0.26ms        ? ?/sec
bench_object_list_unknown_schema            1.00     13.6±0.11ms        ? ?/sec    1.01     13.7±0.12ms        ? ?/sec
bench_object_partially_same_schema          1.01      3.3±0.01ms        ? ?/sec    1.00      3.3±0.01ms        ? ?/sec
bench_object_same_schema                    1.01     38.4±0.05ms        ? ?/sec    1.00     38.2±0.09ms        ? ?/sec
bench_object_unknown_schema                 1.00     16.4±0.04ms        ? ?/sec    1.00     16.3±0.04ms        ? ?/sec
iteration/unvalidated_fallible_iteration    1.00      3.0±0.01ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
iteration/validated_iteration               1.00     46.9±0.20µs        ? ?/sec    1.00     46.9±0.09µs        ? ?/sec
validation/unvalidated_construction         1.00      6.5±0.01µs        ? ?/sec    1.00      6.5±0.01µs        ? ?/sec
validation/validated_construction           1.00     59.8±0.16µs        ? ?/sec    1.01     60.6±0.12µs        ? ?/sec
validation/validation_cost                  1.00     53.0±0.20µs        ? ?/sec    1.02     53.9±0.18µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-8477-variant-to-arrow-decimal (94d60c0) to 760b7b6 diff
BENCH_NAME=variant_validation
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_validation
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-8477-variant-to-arrow-decimal
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 8, 2025

🤖: Benchmark completed

Details

group                               issue-8477-variant-to-arrow-decimal    main
-----                               -----------------------------------    ----
bench_validate_complex_object       1.11    252.7±0.31µs        ? ?/sec    1.00    226.8±0.36µs        ? ?/sec
bench_validate_large_nested_list    1.01     19.2±0.11ms        ? ?/sec    1.00     18.9±0.03ms        ? ?/sec
bench_validate_large_object         1.00     55.2±0.10ms        ? ?/sec    1.00     55.4±0.06ms        ? ?/sec

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent the last couple days trying to come up with a refactor that really makes sense, and failed. The biggest issue is that several parts of the array casting code are fallible, producing an Err result if e.g. a rescaling operation overflows, or if the precision and scale are invalid. And ArrowError allocates a string.

But the variant casting code really wants Option instead of Result, and ends up calling Result:::ok and discarding the allocated string. And then later it might convert that None back to an error, if strict casting was enabled. Very annoying.

Overall, given the small amount of code we're trying to reuse, and the viral nature of Result vs. Option choices (see O::is_valid_decimal_precision vs O::validate_decimal_precision), I start to suspect that we should just duplicate the logic and be done with it. Which is unfortunate because duplication can diverge, and I already found one bug in the arrow cast code while playing around with this refactor.

I left my exploratory comments in place for others to look at and critique, tho.

let d = x.div_wrapping(div);
let r = x.mod_wrapping(div);
// make sure we don't perform calculations that don't make sense w/o validation
validate_decimal_precision_and_scale::<O>(output_precision, output_scale)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this move? It used to get called only for infallible casts, now it gets called for all casts?

Copy link
Contributor Author

@liamzwbao liamzwbao Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be an improvement since it validates the output precision and scale before performing operations on the array. However, these checks don’t affect correctness, because the same validation is performed again when creating the output decimal array here.

The main benefit is that it can fail early on invalid operations and avoid unnecessary operation on array, but it does add some overhead for valid operations since the conditions are checked twice.

So to be consistent, I think we should either add or remove this check across all branches.


/// Build a rescale function from (input_precision, input_scale) to (output_precision, output_scale)
/// returning a closure `Fn(I::Native) -> Option<O::Native>` that performs the conversion.
pub fn rescale_decimal<I, O>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor seems a bit "backward" to me, which probably causes the benchmark regressions:

  • Original code was dispatching to two methods (convert_to_smaller_scale_decimal and convert_to_bigger_or_equal_scale_decimal) from two locations (cast_decimal_to_decimal and cast_decimal_to_decimal_same_type). This avoided some branching in the inner cast loop, because the branch on direction of scale change is taken outside the loop.
  • New code pushes everything down into this new rescale_decimal method, which not only requires the introduction of a new is_infallible_cast helper method, but also leaves the two convert_to_xxx_scale_decimal methods with virtually identical bodies. At that point we may as well eliminate those helpers entirely and avoid the code bloat... but the helpers probably existed for a reason (to hoist at least some branches out of the inner loop).
  • The new code also allocates errors that get downgraded to empty options, where the original code upgraded empty options to errors. Arrow errors allocate strings, so that's a meaningful difference.

I wonder if we should instead do:

  • rework convert_to_smaller_scale_decimal and convert_to_bigger_or_equal_scale_decimal
    • no longer take array or cast_options as input
    • return Ok((f, is_infallible_cast) which corresponds to the return type
      Result<(impl Fn(I::Native) -> Option<O::Native>, bool), ArrowError>
  • define a new generic apply_decimal_cast function helper
    • it takes as input array, cast_options and the (impl Fn, bool) pair produced by a convert_to_xxx_scale_decimal helper
    • it handles the three ways of applying f to an array
  • rework cast_decimal_to_decimal and cast_decimal_to_decimal_same_type to call those functions (see below)
  • rescale_decimal would be the single-row equivalent of cast_decimal_to_decimal, returning Option<O::Native>
  • The decimal builder's constructor calls validate_decimal_precision_and_scale and fails on error, so we don't need to validate on a per-row basis.
cast_decimal_to_decimal
let array: PrimitiveArray<O> = if input_scale > output_scale {
    let (f, is_infallible_cast) = convert_to_smaller_scale_decimal(...)?;
    apply_decimal_cast(array, cast_options, f, is_infallible)?
} else {
    let (f, is_infallible_cast) = convert_to_bigger_or_equal_scale_decimal(...)?;
    apply_decimal_cast(array, cast_options, f, is_infallible)?
}
rescale_decimal
if input_scale > output_scale {
    let (f, _) = convert_to_smaller_scale_decimal(...)?;
    f(integer)
} else {
    let (f, _) = convert_to_bigger_or_equal_scale_decimal(...)?;
    f(integer)
}

@liamzwbao liamzwbao force-pushed the issue-8477-variant-to-arrow-decimal branch from 6638d82 to a48bbf4 Compare October 10, 2025 00:31
@github-actions github-actions bot removed the arrow Changes to the arrow crate label Oct 10, 2025
@liamzwbao
Copy link
Contributor Author

liamzwbao commented Oct 10, 2025

Hi @scovich, yeah, the core functionality we need is just rescale_decimal. I think it’s acceptable to duplicate the code for now, so I reverted the previous change and added only what we need in type_conversion. WDYT?

Once #8580 is merged, I will apply the same fix here. The downside is that if we find a similar bug in the future, we’ll need to fix it in both places. But I think the refactor of arrow-cast would be better handled in a separate PR, and we can figure out how to reuse the code later

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants