Systematic fuzz testing for parquet predicate pushdown

### Is your feature request related to a problem or challenge?

We have several forms of predicate pushdown in DataFusion's Parquet reader. The code path taken depends on the exact data layout and predicates defined

@itsjunetime  is working on https://github.com/apache/datafusion/issues/4028 to improve performance by being more clever about some of these predicates. 

The current code paths taken depend on 
1. Row group size
2. Sort order of the data within the file
3. File repartitioning size (how many partitions are read)
4. Number of row groups
3. Datapage size
3. Use predicate pushdown?
3. Use predicate reordering?


### Describe the solution you'd like

I would like some additional test coverage (for correctness) when reading from parquet files with the various forms of pushdown enabled. It is especially important to ensure correctness with the various pushdowns enabled. 

### Describe alternatives you've considered

I would like to have a test that
1. Creates multiple parquet files with different orderings / row group distribution etc
2. Runs the same query on the same input
3. Compares the results from the different queries and ensures it is the same


Parameters to check
1. Row group size
2. Sort order
3. Number of row groups
3. Datapage size
3. Use predicate pushdown
4. use predicate reordering


### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Systematic fuzz testing for parquet predicate pushdown #12115

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Systematic fuzz testing for parquet predicate pushdown #12115

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions