Avoid evaluating filters when they can be discarded purely from statistics

### Is your feature request related to a problem or challenge?

Currently stats filter pruning (both at the row group and page level) has one of two outcomes per container:
1. This container cannot possibly match the filter (discard it).
2. This container *may* match the filter, but which rows to include or exclude needs to be confirmed by evaluating each row of the data.

There is a big optimization here which is *if we know that every row in the container matches the filter, we don't need to evaluate the filter at all*.

Consider a column `name` with values `["Adrian", "Adrian", "Adrian"]`. The min/max stats are `"Adrian"/"Adrian"`. A query with the filter `name = "Adrian"` should not need to ever read the column to know that all rows match the filter.

Another relevant case is a `ts` column with values `["2025-01-01T00:00:00Z", ..., "2025-01-01T00:01:32Z"]`. The values need not be sorted or ordered, but let's say that the min/max stats are `"2025-01-01T00:00:00Z"/"2025-01-01T00:01:32Z"`. For a filter `ts > '2024-12-31T00:00:00Z'` there should be no need to evaluate the filter on every row: we know just from stats that every row matches.

We could incorporate this change, but it would require some refactoring of https://github.com/apache/datafusion/blob/main/datafusion/physical-optimizer/src/pruning.rs and consumers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid evaluating filters when they can be discarded purely from statistics #15425

Is your feature request related to a problem or challenge?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid evaluating filters when they can be discarded purely from statistics #15425

Description

Is your feature request related to a problem or challenge?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions