Description
This is a feature request for the TensorBoard plugin pr_curves!
There is very little power over the thresholds used by the plugin and I was wondering if this is necessary. Specifically, I would like to give a specific list of thresholds (same length as precision, recall, TP, TN, FP, FP) for the raw_data_op
, which would be used instead of an evenly distributed threshold between 0 and 1. I imagine this could work like the bins
argument in matplotlib or numpy, taking either an int for the default behaviour or a 1D-Tensor specifying the thresholds individually.
Where I am coming from:
I do object detection and we have quite a lot of objects in the validation set and even more false predictions (order of 1.5M). In this case it makes sense to have a very fine-grained PR curve (1k-100k threshold values). However the default op
does not handle this efficiently (trying to allocate a tensor of shape [10k, 1.5M]). So I am currently computing precision and recall through TF:
n = tf.size(pr_labels)
pr_counts = tf.range(n, 0, delta=-1, dtype=tf.int32)
pr_sort_idxs = tf.argsort(pr_predictions)
pr_labels_sort = tf.gather(pr_labels, pr_sort_idxs)
pr_predictions_sort = tf.gather(pr_predictions, pr_sort_idxs)
pr_gt_pos_count = tf.reduce_sum(pr_labels_sort)
pr_gt_neg_count = n - pr_gt_pos_count
pr_tp = tf.cumsum(pr_labels_sort, reverse=True)
pr_fp = pr_counts - pr_tp
pr_tn = pr_gt_neg_count - pr_fp
pr_fn = pr_gt_pos_count - pr_tp
pr_precision = pr_tp / pr_counts
pr_recall = pr_tp / pr_gt_pos_count
Now I could use pr_predictions_sort
to provide exactly the thresholds, however this is sadly not possible, so I am actually ignoring them and use the linearly spaced (wrong/pseudo) thresholds.
I am telling this story, because it seems like a pretty standard/normal object detection setup and I do not really understand, why it is so hard to make it work with the current API of pr_curve
.
Thanks in advance for considering/discussing this feature request. I would gladly provide more context if needed.