Skip to content

Weird num_thresholds behaviour in pr_curve plugin. #1536

Open
@ruro

Description

@ruro

TensorBoard: v1.11.0 according to pip, also 1.11.0 when running.
TensorFlow: v1.11.0
Python: 3.7.0

The num_thresholds parameter of pr_curve_raw_data_pb from the pr_curve plugin has inconsistent and weird behaviour.

  1. If num_thresholds < len(precision), then instead of approximating the PR curve in num_thresholds points, like one might expect, only the first num_thresholds points are taken. This cuts off the PR curve. There seems to be no reason to ever supply num_thresholds < len(precision), because this results in broken PR curves.

  2. The solution to (1) would be passing num_thresholds=len(precision). However if pr_curve_raw_data_pb is called multiple times (i.e. to get the steps slider), it seems, that only the first num_thresholds is respected and all the consecutive PR curves are cut off.

Example:

from tensorboard.summary import pr_curve_raw_data_pb
import tensorflow as tf
import numpy as np

# [1] Curve 1 (3 data points)
pre1 = np.asarray([0, .75, 1])
rec1 = np.asarray([1, .75, 0])
tp1 = fp1 = tn1 = fn1 = np.asarray([1] * 3)

# [2] Curve 2 (5 data points)
pre2 = np.asarray([0, .7, .8, .9, 1])
rec2 = np.asarray([1, .75, .5, .25, 0])
tp2 = fp2 = tn2 = fn2 = np.asarray([1] * 5)

with tf.summary.FileWriter('./logs') as writer:
    # [3] Plotting curve 1 with proper num_thresholds = len(precision) = 3
    problem2_1 = pr_curve_raw_data_pb('problem2', tp1, fp1, tn1, fn1, pre1, rec1, num_thresholds=len(pre1))
    writer.add_summary(problem2_1, 1)

    # [4] Plotting curve 2 with the same name as [3] with proper num_thresholds = len(precision) = 5
    problem2_2 = pr_curve_raw_data_pb('problem2', tp2, fp2, tn2, fn2, pre2, rec2, num_thresholds=len(pre2))
    writer.add_summary(problem2_2, 2)

The above code results in this PR curve for step 2 (only the first 3 points are drawn).
image

The full curve should look like this:
image

A solution would be either to allow different steps of the same curve to have different num_thresholds or to pick num_thresholds evenly spread samples from precision and recall arrays.

P.S. Also it would be great, if the true_negative parameter was optional. The concept of a True Negative doesn't exist in some tasks, like detection.

Activity

stephanwlee

stephanwlee commented on Oct 22, 2018

@stephanwlee
Contributor

only the first num_thresholds points are taken. This cuts off the PR curve

From the code, it looks like this behavior is working as expected:
https://github.com/tensorflow/tensorboard/blob/master/tensorboard/backend/event_processing/event_accumulator.py#L371-L377

@nfelt I am not too familiar with this threshold argument to the tb.summary. Can you please TAL? Thanks!

ruro

ruro commented on Oct 22, 2018

@ruro
Author

Hi. I think the behaviour of metadata being kept only from the first call is fine. The problem is that arrays passed into pr_curve_raw_data_pb are silently cropped to that length.

If num_thresholds is supposed to be the number of separate threshold levels, where threshold=0 is the point with maximum recall and threshold=1 is the point with maximum precision, then it doesn't make sense for it to cut off the curve.

Either don't pass num_thresholds as metadata (idk if this is possible), so that it can be changed.
Or sample the arrays at num_thresholds evenly spread samples.
Or at least document the fact, that it is only valid for num_thresholds>=len(precision) and raise an error otherwise.

Currently the docstring claims that

num_thresholds: Number of thresholds, evenly distributed in `[0, 1]`, to
        compute PR metrics for. Should be an int `>= 2`.
ruro

ruro commented on Nov 13, 2018

@ruro
Author

Hi. Any updates on this issue?

stephanwlee

stephanwlee commented on Nov 13, 2018

@stephanwlee
Contributor

@chihuahua Hello Chi! @nfelt said you may have a little more context around the thresholding behavior. Can you please enlighten us? Thanks!

ucesfpa

ucesfpa commented on Jul 9, 2020

@ucesfpa

Hi, want to use the example given here but it is not working.

tensorboard.version
'2.2.1'

tensorflow.version
'2.2.0-rc3'

from tensorboard.summary import pr_curve_raw_data_pb

I get the following error:

from tensorboard.summary import pr_curve_raw_data_pb
ImportError: cannot import name 'pr_curve_raw_data_pb'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nfelt@stephanwlee@ruro@ucesfpa

        Issue actions

          Weird num_thresholds behaviour in pr_curve plugin. · Issue #1536 · tensorflow/tensorboard