Skip to content

Specify custom thresholds for pr_curves #2064

Open
@demmerichs

Description

@demmerichs

This is a feature request for the TensorBoard plugin pr_curves!

There is very little power over the thresholds used by the plugin and I was wondering if this is necessary. Specifically, I would like to give a specific list of thresholds (same length as precision, recall, TP, TN, FP, FP) for the raw_data_op, which would be used instead of an evenly distributed threshold between 0 and 1. I imagine this could work like the bins argument in matplotlib or numpy, taking either an int for the default behaviour or a 1D-Tensor specifying the thresholds individually.

Where I am coming from:
I do object detection and we have quite a lot of objects in the validation set and even more false predictions (order of 1.5M). In this case it makes sense to have a very fine-grained PR curve (1k-100k threshold values). However the default op does not handle this efficiently (trying to allocate a tensor of shape [10k, 1.5M]). So I am currently computing precision and recall through TF:

        n = tf.size(pr_labels)
        pr_counts = tf.range(n, 0, delta=-1, dtype=tf.int32)
        pr_sort_idxs = tf.argsort(pr_predictions)
        pr_labels_sort = tf.gather(pr_labels, pr_sort_idxs)
        pr_predictions_sort = tf.gather(pr_predictions, pr_sort_idxs)
        pr_gt_pos_count = tf.reduce_sum(pr_labels_sort)
        pr_gt_neg_count = n - pr_gt_pos_count
        pr_tp = tf.cumsum(pr_labels_sort, reverse=True)
        pr_fp = pr_counts - pr_tp
        pr_tn = pr_gt_neg_count - pr_fp
        pr_fn = pr_gt_pos_count - pr_tp
        pr_precision = pr_tp / pr_counts
        pr_recall = pr_tp / pr_gt_pos_count

Now I could use pr_predictions_sort to provide exactly the thresholds, however this is sadly not possible, so I am actually ignoring them and use the linearly spaced (wrong/pseudo) thresholds.
I am telling this story, because it seems like a pretty standard/normal object detection setup and I do not really understand, why it is so hard to make it work with the current API of pr_curve.

Thanks in advance for considering/discussing this feature request. I would gladly provide more context if needed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions