Skip to content

Request: A quantile function for ordinal data only #27367

@pdeffebach

Description

@pdeffebach

There are a variety of types of data for which order is defined, but not other mathematical operations. It seems to be the consensus, for instance, that the Date type should not have + or / defined for it.

However, if you have a vector of dates, you might still want to know the "quantiles" of those dates. If you can sort a vector of dates, because < is defined, you can ask "What is the 25th percentile of dates in my vector?"

You can't do this with the current quantile function, because in the case of a tie, it finds a midpoint between the two values by taking a mean.

R's quantile function has the keyword argument Type, and when you call quantile(x, ..., Type = 1) it returns the lower of the two values in the case of a tie.

I am currently working on a better describe function for returning summary statistics of a DataFrame, and think it would be useful to return a quantile-like value for ordinal data. Unfortunately, such a function is not defined either here or in StatsBase.

quantile is a super well-written function in Base, being clever enough to only sort values between the minimum and maximum percentiles asked for. Writing an ordinal quantile function in StatsBase would essentially mean re-writing the quantile function entirely. Rather, I think it makes sense to add a new method, call it ordquantile or something that keeps everything in the current quantile function except for the part that takes the mean of ties, and returns the lower value instead.

Does this reasoning make sense for it to live here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIndicates new feature / enhancement requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions