-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
There are a variety of types of data for which order is defined, but not other mathematical operations. It seems to be the consensus, for instance, that the Date
type should not have +
or /
defined for it.
However, if you have a vector of dates, you might still want to know the "quantiles" of those dates. If you can sort a vector of dates, because <
is defined, you can ask "What is the 25th percentile of dates in my vector?"
You can't do this with the current quantile
function, because in the case of a tie, it finds a midpoint between the two values by taking a mean.
R
's quantile function has the keyword argument Type
, and when you call quantile(x, ..., Type = 1)
it returns the lower of the two values in the case of a tie.
I am currently working on a better describe
function for returning summary statistics of a DataFrame
, and think it would be useful to return a quantile-like value for ordinal data. Unfortunately, such a function is not defined either here or in StatsBase
.
quantile
is a super well-written function in Base
, being clever enough to only sort values between the minimum and maximum percentiles asked for. Writing an ordinal quantile
function in StatsBase
would essentially mean re-writing the quantile
function entirely. Rather, I think it makes sense to add a new method, call it ordquantile
or something that keeps everything in the current quantile
function except for the part that takes the mean of ties, and returns the lower value instead.
Does this reasoning make sense for it to live here?