Skip to content

Conversation

timsaucer
Copy link
Member

@timsaucer timsaucer commented May 20, 2025

Which issue does this PR close?

Rationale for this change

With the switch from DataType to Field we may have some plans that create a lot of copy operations. This PR switches from Field to FieldRef for the user defined scalar, window, and aggregate functions.

What changes are included in this PR?

Changes to the API for the user defined functions and similar supporting operations throughout the code base.

Are these changes tested?

Tested via existing unit tests.

Are there any user-facing changes?

None (the APIs that are changed are not yet released)

TODO:

  • Update migration guide to describe changes necessary for end users

@timsaucer timsaucer self-assigned this May 21, 2025
@timsaucer timsaucer added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate common Related to common crate execution Related to the execution crate proto Related to proto crate ffi Changes to the ffi crate spark labels May 21, 2025
@github-actions github-actions bot added optimizer Optimizer rules functions Changes to functions implementation physical-plan Changes to the physical-plan crate and removed common Related to common crate execution Related to the execution crate labels May 21, 2025
@timsaucer timsaucer force-pushed the feat/reduce-field-copy branch from add60f1 to 3939a9e Compare May 21, 2025 12:04
@timsaucer timsaucer force-pushed the feat/reduce-field-copy branch from 8154e01 to c361436 Compare May 22, 2025 11:42
@timsaucer timsaucer marked this pull request as ready for review May 22, 2025 12:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @timsaucer -- I had hoped this would be a bigger improvement, but I think it at least sets us up for being more efficient / less String cloning going forward

FieldRef will also allow metadata to be copied through hopefully without requiring a deep copy

cc @andygrove as I think Comet already had to update to a pre-release version. This might be disruptive again

}

fn state_fields(&self, _args: StateFieldsArgs) -> Result<Vec<Field>> {
fn state_fields(&self, _args: StateFieldsArgs) -> Result<Vec<FieldRef>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice that this is now avoiding a deep copy of a bunch of Fields 👍


fn return_field_from_args(&self, _args: ReturnFieldArgs) -> Result<Field> {
fn return_field_from_args(&self, _args: ReturnFieldArgs) -> Result<FieldRef> {
Ok(Field::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in theory we could update this code to create the FieldRef once on creation, and then return an Arc::clone rather than re-creating the Field each time -- perhaps we can do that as some follow on PRs.

@alamb alamb added the api change Changes the API exposed to users of the crate label May 22, 2025
@timsaucer
Copy link
Member Author

I had hoped this would be a bigger improvement, but I think it at least sets us up for being more efficient / less String cloning going forward

Can you expand on this a little? Was there a specific metric you were watching to see performance improvements or is it looking at the code that we roughly have the same number of allocation operations? Or something else?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 23, 2025
@alamb
Copy link
Contributor

alamb commented May 23, 2025

Can you expand on this a little? Was there a specific metric you were watching to see performance improvements or is it looking at the code that we roughly have the same number of allocation operations? Or something else?

It was the latter -- mostly I was thinking we'd be able to reuse FieldRef more than Field. There are some improvements for sure but for some reason I expected more. I don't think this is a big deal but wanted to mention it

@alamb alamb merged commit 00132da into apache:main May 28, 2025
30 checks passed
@alamb
Copy link
Contributor

alamb commented May 28, 2025

Thanks @timsaucer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate documentation Improvements or additions to documentation ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce Field Copy operations before releasing 48.0.0
2 participants