Skip to content

Prevent FlightSQL server panics for do_put when stream is empty or 1st stream element is an Err #7492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 26, 2025

Conversation

superserious-dev
Copy link
Contributor

Which issue does this PR close?

Closes #7329 .

Rationale for this change

When executing a do_put call, there are currently 2 cases that will cause the FlightSQL server to panic:

  • the request stream is empty
  • the first element of the request stream is an Err

This change stops the FlightSQL server from panicking in both cases.

Are there any user-facing changes?

  • If the client sends an empty request stream, the server reports that 0 records were affected.
  • If the client sends a stream where the first element is an Err, the Err is reported back by the client and stream processing stops. This is the same as the current behavior when the Err is not the first element.

The server does not panic in either case.

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels May 12, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @superserious-dev -- this is a nice contribution; I left some suggestions, let me know what you think

let output = futures::stream::iter(vec![Ok(PutResult {
app_metadata: result.encode_to_vec().into(),
})]);
return Ok(Response::new(Box::pin(output)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

II the client is not sending a command, it probably is making a mistake, and returning an Ok will mask the problem rather than failing fast

Rather than return an Ok, I recommend invoking do_put_fallback in this case which will error by default, and lets users override the behavior if they wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It looks like do_put_fallback requires providing a message of type Any. What do you recommend passing in for the message?

Here are a few potential options that pop out to me:

  • Make the message an Option. Unfortunately this would be user-facing modification of the callback function signature.
  • Create a new variant on the Command enum like Command::Missing and add a corresponding protobuf message type.
  • Create a new callback for unrecoverable errors, eg. do_put_error, which doesn't require passing in a message. It could take some kind of error enum with different types of errors, eg DoPutError::MissingCommand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend a new callback for unrecoverable errors as that would be a backwards compatible API change that we could release in the next release (not have to wait for the next major release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this.

.unwrap_err();
assert_eq!(
err.to_string(),
"External error: Not yet implemented: Client error message"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error never makes it to the server -- it is redirected client side in https://github.com/apache/arrow-rs/blob/87ff531f0b4e109c2166c548d20c850e6b2c8fc5/arrow-flight/src/sql/client.rs#L269-L268

I think you'll have to create a manual FlightClient and send the error directly to exercise the server code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to update the test's comment to something like "make sure the request-side client error is redirected"?
My intent with this test is to verify that the client error in the batch gets surfaced. The current behavior is that a tonic error is being surfaced from the server panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to update the test's comment to something like "make sure the request-side client error is redirected"?

perhaps -- I am pretty sure that behavior was already covered by an existing test (though I didn't try and find it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent with this test is to verify that the client error in the batch gets surfaced. The current behavior is that a tonic error is being surfaced from the server panic.

I guess I am saying that the error redirection is only happening in the rust client. If the actual server got an Error (maybe a network error or something) this test doesn't cover it, nor does the code int his PR cover it.

I don't think this PR needs to handle that case (as it isn't handled today) but I thought this test was misleading as it implies (at least to me) that the server is handling an error sent from the client, when that is not the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional context. I modified the tests to call do_put directly and verified that the new custom errors are emitted instead of the server panicking.
The caveat is that it required changing the visibility of FallibleRequestStream from pub(crate) to pub in order to construct a problematic stream for a test. Please let me know if there is a better way to do this.


let message =
Any::decode(&*cmd.flight_descriptor.unwrap().cmd).map_err(decode_error_to_status)?;
Any::decode(&*cmd?.flight_descriptor.unwrap().cmd).map_err(decode_error_to_status)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not introduced by this PR, but I think this other unwrap will still panic if a client sends a malformed input (no flight_descriptor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like it. Should this also get routed through do_put_fallback like the other 2 server panic cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, yes

@@ -32,15 +32,16 @@ use std::task::{ready, Poll};
///
/// This can be used to accept a stream of `Result<_>` from a client API and send
/// them to the remote server that wants only the successful results.
pub(crate) struct FallibleRequestStream<T, E> {
pub struct FallibleRequestStream<T, E> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the visibility of this in order to create the test that validates the bug fix.

/// sender to notify error
sender: Option<Sender<E>>,
/// fallible stream
fallible_stream: Pin<Box<dyn Stream<Item = std::result::Result<T, E>> + Send + 'static>>,
}

impl<T, E> FallibleRequestStream<T, E> {
pub(crate) fn new(
/// Create a FallibleRequestStream
pub fn new(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: visibility change.

assert_eq!(ingested_batches, vec![]);
.unwrap_err();
assert!(
err.to_string().contains("Unhandled Error: Command is missing."),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only checking that the string contains the expected error. The full error appears to be something like an ArrowError::IpcError that holds a stringified Tonic Status::Unimplemented error.

.unwrap_err();
assert_eq!(
err.to_string(),
"External error: Not yet implemented: Client error message"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional context. I modified the tests to call do_put directly and verified that the new custom errors are emitted instead of the server panicking.
The caveat is that it required changing the visibility of FallibleRequestStream from pub(crate) to pub in order to construct a problematic stream for a test. Please let me know if there is a better way to do this.

let output = futures::stream::iter(vec![Ok(PutResult {
app_metadata: result.encode_to_vec().into(),
})]);
return Ok(Response::new(Box::pin(output)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @superserious-dev -- this looks really nice to me now. 👍

@alamb
Copy link
Contributor

alamb commented May 21, 2025

I merged up from main and ran cargo fmt to hopefully get a clean CI run

@alamb alamb merged commit 88d59f8 into apache:main May 26, 2025
12 checks passed
@alamb
Copy link
Contributor

alamb commented May 26, 2025

Thanks again @superserious-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Ingestion with Arrow Flight Sql panic when the input stream is empty or fallible
2 participants