-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47740: [C++][Parquet] Fix undefined behavior when reading invalid Parquet data #47741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@github-actions crossbow submit -g cpp |
@AntoinePrv Would you like to take a look? |
// There may be remaining null if they are not greedily filled by either decoder calls | ||
check_and_handle_fully_null_remaining(); | ||
|
||
ARROW_DCHECK(batch.is_done() || exhausted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check could trigger if the RLE-bit-packed data is invalid (for example a run of invalid size). @AntoinePrv
Revision: d620685 Submitted crossbow builds: ursacomputing/crossbow @ actions-0059c16459 |
Valgrind failure is unrelated and will be fixed by #47743 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - nice work!
Rationale for this change
Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:
Are these changes tested?
Yes, using the updated fuzz regression files from apache/arrow-testing#115
Are there any user-facing changes?
No.
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)