Skip to content

Conversation

Rich-T-kid
Copy link

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

  • Closes #NNN.

Rationale for this change

context
For whatever reason, I’m running into an authentication issue when trying to push to the branch I created on my Datadog account. I wanted to push some of the refactoring changes that Brancz mentioned on my previous PRs.

I’ve been very busy with school and work, so I don’t think I’ll be able to continue contributing to the project at the same level as I did during the summer. Hopefully, I can still help out with smaller tasks around the codebase, but I don’t have large chunks of time to devote to REE right now.

Sorry about that—I hope I can still make smaller contributions.

Implement casting between REE arrays and other Arrow types. REE-to-REE casting
validates run-end upcasts only (Int16→Int32, Int16→Int64, Int32→Int64) to prevent
invalid sequences.
Implement casting between REE arrays and other Arrow types. REE-to-REE casting
validates run-end upcasts only (Int16→Int32, Int16→Int64, Int32→Int64) to prevent
invalid sequences.

rebased changes
@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 18, 2025
@vegarsti
Copy link

@Rich-T-kid I made some comments on the other PR #7713

Comment on lines +179 to +181
// Both non-null - use slice comparison as a basic approach
// This is a simplified implementation
cast_array.slice(i, 1).to_data() == cast_array.slice(i - 1, 1).to_data()
Copy link

@vegarsti vegarsti Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing @brancz's comments about not copying #7713 (comment)

Suggested change
// Both non-null - use slice comparison as a basic approach
// This is a simplified implementation
cast_array.slice(i, 1).to_data() == cast_array.slice(i - 1, 1).to_data()
// Primitive types
DataType::Boolean => {
cast_array.as_boolean().value(i) == cast_array.as_boolean().value(i - 1)
}
DataType::Int8 => {
cast_array.as_primitive::<Int8Type>().value(i)
== cast_array.as_primitive::<Int8Type>().value(i - 1)
}
DataType::Int16 => {
cast_array.as_primitive::<Int16Type>().value(i)
== cast_array.as_primitive::<Int16Type>().value(i - 1)
}
DataType::Int32 => {
cast_array.as_primitive::<Int32Type>().value(i)
== cast_array.as_primitive::<Int32Type>().value(i - 1)
}
DataType::Int64 => {
cast_array.as_primitive::<Int64Type>().value(i)
== cast_array.as_primitive::<Int64Type>().value(i - 1)
}
DataType::UInt8 => {
cast_array.as_primitive::<UInt8Type>().value(i)
== cast_array.as_primitive::<UInt8Type>().value(i - 1)
}
DataType::UInt16 => {
cast_array.as_primitive::<UInt16Type>().value(i)
== cast_array.as_primitive::<UInt16Type>().value(i - 1)
}
DataType::UInt32 => {
cast_array.as_primitive::<UInt32Type>().value(i)
== cast_array.as_primitive::<UInt32Type>().value(i - 1)
}
DataType::UInt64 => {
cast_array.as_primitive::<UInt64Type>().value(i)
== cast_array.as_primitive::<UInt64Type>().value(i - 1)
}
DataType::Float16 => {
cast_array.as_primitive::<Float16Type>().value(i)
== cast_array.as_primitive::<Float16Type>().value(i - 1)
}
DataType::Float32 => {
cast_array.as_primitive::<Float32Type>().value(i)
== cast_array.as_primitive::<Float32Type>().value(i - 1)
}
DataType::Float64 => {
cast_array.as_primitive::<Float64Type>().value(i)
== cast_array.as_primitive::<Float64Type>().value(i - 1)
}
// String types
DataType::Utf8 => {
cast_array.as_string::<i32>().value(i)
== cast_array.as_string::<i32>().value(i - 1)
}
DataType::LargeUtf8 => {
cast_array.as_string::<i64>().value(i)
== cast_array.as_string::<i64>().value(i - 1)
}
DataType::Utf8View => {
cast_array.as_string_view().value(i) == cast_array.as_string_view().value(i - 1)
}
// Binary types
DataType::Binary => {
cast_array.as_binary::<i32>().value(i)
== cast_array.as_binary::<i32>().value(i - 1)
}
DataType::LargeBinary => {
cast_array.as_binary::<i64>().value(i)
== cast_array.as_binary::<i64>().value(i - 1)
}
DataType::BinaryView => {
cast_array.as_binary_view().value(i) == cast_array.as_binary_view().value(i - 1)
}
DataType::FixedSizeBinary(_) => {
cast_array.as_fixed_size_binary().value(i)
== cast_array.as_fixed_size_binary().value(i - 1)
}
// Temporal types
DataType::Date32 => {
cast_array.as_primitive::<Date32Type>().value(i)
== cast_array.as_primitive::<Date32Type>().value(i - 1)
}
DataType::Date64 => {
cast_array.as_primitive::<Date64Type>().value(i)
== cast_array.as_primitive::<Date64Type>().value(i - 1)
}
DataType::Timestamp(time_unit, _) => match time_unit {
TimeUnit::Second => {
cast_array.as_primitive::<TimestampSecondType>().value(i)
== cast_array
.as_primitive::<TimestampSecondType>()
.value(i - 1)
}
TimeUnit::Millisecond => {
cast_array
.as_primitive::<TimestampMillisecondType>()
.value(i)
== cast_array
.as_primitive::<TimestampMillisecondType>()
.value(i - 1)
}
TimeUnit::Microsecond => {
cast_array
.as_primitive::<TimestampMicrosecondType>()
.value(i)
== cast_array
.as_primitive::<TimestampMicrosecondType>()
.value(i - 1)
}
TimeUnit::Nanosecond => {
cast_array
.as_primitive::<TimestampNanosecondType>()
.value(i)
== cast_array
.as_primitive::<TimestampNanosecondType>()
.value(i - 1)
}
},
DataType::Time32(time_unit) => match time_unit {
TimeUnit::Second => {
cast_array.as_primitive::<Time32SecondType>().value(i)
== cast_array.as_primitive::<Time32SecondType>().value(i - 1)
}
TimeUnit::Millisecond => {
cast_array.as_primitive::<Time32MillisecondType>().value(i)
== cast_array
.as_primitive::<Time32MillisecondType>()
.value(i - 1)
}
TimeUnit::Microsecond | TimeUnit::Nanosecond => {
panic!("Time32 must have a TimeUnit of either seconds or milliseconds")
}
},
DataType::Time64(time_unit) => match time_unit {
TimeUnit::Second | TimeUnit::Millisecond => {
panic!("Time64 must have a TimeUnit of either microseconds or nanoseconds")
}
TimeUnit::Microsecond => {
cast_array.as_primitive::<Time64MicrosecondType>().value(i)
== cast_array
.as_primitive::<Time64MicrosecondType>()
.value(i - 1)
}
TimeUnit::Nanosecond => {
cast_array.as_primitive::<Time64NanosecondType>().value(i)
== cast_array
.as_primitive::<Time64NanosecondType>()
.value(i - 1)
}
},
DataType::Duration(time_unit) => match time_unit {
TimeUnit::Second => {
cast_array.as_primitive::<DurationSecondType>().value(i)
== cast_array.as_primitive::<DurationSecondType>().value(i - 1)
}
TimeUnit::Millisecond => {
cast_array
.as_primitive::<DurationMillisecondType>()
.value(i)
== cast_array
.as_primitive::<DurationMillisecondType>()
.value(i - 1)
}
TimeUnit::Microsecond => {
cast_array
.as_primitive::<DurationMicrosecondType>()
.value(i)
== cast_array
.as_primitive::<DurationMicrosecondType>()
.value(i - 1)
}
TimeUnit::Nanosecond => {
cast_array.as_primitive::<DurationNanosecondType>().value(i)
== cast_array
.as_primitive::<DurationNanosecondType>()
.value(i - 1)
}
},
DataType::Interval(interval_unit) => match interval_unit {
IntervalUnit::YearMonth => {
cast_array.as_primitive::<IntervalYearMonthType>().value(i)
== cast_array
.as_primitive::<IntervalYearMonthType>()
.value(i - 1)
}
IntervalUnit::DayTime => {
cast_array.as_primitive::<IntervalDayTimeType>().value(i)
== cast_array
.as_primitive::<IntervalDayTimeType>()
.value(i - 1)
}
IntervalUnit::MonthDayNano => {
cast_array
.as_primitive::<IntervalMonthDayNanoType>()
.value(i)
== cast_array
.as_primitive::<IntervalMonthDayNanoType>()
.value(i - 1)
}
},
// Decimal types
DataType::Decimal32(_, _) => {
cast_array.as_primitive::<Decimal32Type>().value(i)
== cast_array.as_primitive::<Decimal32Type>().value(i - 1)
}
DataType::Decimal64(_, _) => {
cast_array.as_primitive::<Decimal64Type>().value(i)
== cast_array.as_primitive::<Decimal64Type>().value(i - 1)
}
DataType::Decimal128(_, _) => {
cast_array.as_primitive::<Decimal128Type>().value(i)
== cast_array.as_primitive::<Decimal128Type>().value(i - 1)
}
DataType::Decimal256(_, _) => {
cast_array.as_primitive::<Decimal256Type>().value(i)
== cast_array.as_primitive::<Decimal256Type>().value(i - 1)
}
// TODO: How to handle REE?
DataType::RunEndEncoded(_, _) => todo!(),
DataType::Null
| DataType::List(_)
| DataType::ListView(_)
| DataType::FixedSizeList(_, _)
| DataType::LargeList(_)
| DataType::LargeListView(_)
| DataType::Struct(_)
| DataType::Union(_, _)
| DataType::Dictionary(_, _)
| DataType::Map(_, _) => false,

Copy link

@vegarsti vegarsti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two changes together address @brancz's comments here #7713 (comment)

Comment on lines +144 to +150
let cast_array = if array.data_type() == value_type {
// No casting needed, use the array as-is
make_array(array.to_data())
} else {
// Cast to the target value type
cast_with_options(array, value_type, cast_options)?
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let cast_array = if array.data_type() == value_type {
// No casting needed, use the array as-is
make_array(array.to_data())
} else {
// Cast to the target value type
cast_with_options(array, value_type, cast_options)?
};
let cast_array = if array.data_type() == value_type {
array
} else {
&cast_with_options(array, value_type, cast_options)?
};

let indices = PrimitiveArray::<UInt32Type>::from_iter_values(
values_indices.iter().map(|&idx| idx as u32),
);
let values_array = take(&cast_array, &indices, None)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values_array = take(&cast_array, &indices, None)?;
let values_array = take(cast_array, &indices, None)?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants