Description
It would be useful if exploratory plots came with a visual indicator of “discarded data”.
This would improve Plot's capacity for exploratory data analysis by enabling users to become aware of anomalous values that violate their assumptions about the data.
For example, I changed a scale from log to symlog and discovered a bunch of negative values where I wasn’t expecting any.
The data was supposed to be strictly positive and the negative values indicated a processing error, but since the default log scale filtered those data points out I only noticed because I went out of my way to do additional spot checks.
Plot could have made it evident immediately, e.g. with a legend saying something like “100 datapoints not shown”. Even more useful (maybe) would be being able to see a "data pipeline" and how many points are filtered out at each stage.
@Fil observes that some filters use the discarding as a basic mechanism to do their work as intended, so there are subtle questions about what to communicate for this to be a useful signal.
For the exploratory use case I think it makes sense for this to be on by default, since spot-checking every individual assumption manually can get onerous (e.g. checking for null/undefined, zeros where there shouldn’t be any, negative numbers where there shouldn’t be any, values outside of the x/y/color domain, NaN, etc.)
A separate tool such as a summary table could be used to learn about missing/pathological data in a dataset, but it would still be useful for Plot to flag these issues since they can creep in during downstream processing and plot transformations.