Description
TensorBoard: v1.11.0 according to pip, also 1.11.0 when running.
TensorFlow: v1.11.0
Python: 3.7.0
The num_thresholds
parameter of pr_curve_raw_data_pb
from the pr_curve
plugin has inconsistent and weird behaviour.
-
If
num_thresholds < len(precision)
, then instead of approximating the PR curve innum_thresholds
points, like one might expect, only the firstnum_thresholds
points are taken. This cuts off the PR curve. There seems to be no reason to ever supplynum_thresholds < len(precision)
, because this results in broken PR curves. -
The solution to (1) would be passing
num_thresholds=len(precision)
. However ifpr_curve_raw_data_pb
is called multiple times (i.e. to get the steps slider), it seems, that only the firstnum_thresholds
is respected and all the consecutive PR curves are cut off.
Example:
from tensorboard.summary import pr_curve_raw_data_pb
import tensorflow as tf
import numpy as np
# [1] Curve 1 (3 data points)
pre1 = np.asarray([0, .75, 1])
rec1 = np.asarray([1, .75, 0])
tp1 = fp1 = tn1 = fn1 = np.asarray([1] * 3)
# [2] Curve 2 (5 data points)
pre2 = np.asarray([0, .7, .8, .9, 1])
rec2 = np.asarray([1, .75, .5, .25, 0])
tp2 = fp2 = tn2 = fn2 = np.asarray([1] * 5)
with tf.summary.FileWriter('./logs') as writer:
# [3] Plotting curve 1 with proper num_thresholds = len(precision) = 3
problem2_1 = pr_curve_raw_data_pb('problem2', tp1, fp1, tn1, fn1, pre1, rec1, num_thresholds=len(pre1))
writer.add_summary(problem2_1, 1)
# [4] Plotting curve 2 with the same name as [3] with proper num_thresholds = len(precision) = 5
problem2_2 = pr_curve_raw_data_pb('problem2', tp2, fp2, tn2, fn2, pre2, rec2, num_thresholds=len(pre2))
writer.add_summary(problem2_2, 2)
The above code results in this PR curve for step 2 (only the first 3 points are drawn).
The full curve should look like this:
A solution would be either to allow different steps of the same curve to have different num_thresholds
or to pick num_thresholds
evenly spread samples from precision and recall arrays.
P.S. Also it would be great, if the true_negative
parameter was optional. The concept of a True Negative doesn't exist in some tasks, like detection.
Activity
stephanwlee commentedon Oct 22, 2018
From the code, it looks like this behavior is working as expected:
https://github.com/tensorflow/tensorboard/blob/master/tensorboard/backend/event_processing/event_accumulator.py#L371-L377
@nfelt I am not too familiar with this threshold argument to the tb.summary. Can you please TAL? Thanks!
ruro commentedon Oct 22, 2018
Hi. I think the behaviour of metadata being kept only from the first call is fine. The problem is that arrays passed into
pr_curve_raw_data_pb
are silently cropped to that length.If
num_thresholds
is supposed to be the number of separate threshold levels, wherethreshold=0
is the point with maximum recall andthreshold=1
is the point with maximum precision, then it doesn't make sense for it to cut off the curve.Either don't pass
num_thresholds
as metadata (idk if this is possible), so that it can be changed.Or sample the arrays at
num_thresholds
evenly spread samples.Or at least document the fact, that it is only valid for
num_thresholds>=len(precision)
and raise an error otherwise.Currently the docstring claims that
ruro commentedon Nov 13, 2018
Hi. Any updates on this issue?
stephanwlee commentedon Nov 13, 2018
@chihuahua Hello Chi! @nfelt said you may have a little more context around the thresholding behavior. Can you please enlighten us? Thanks!
ucesfpa commentedon Jul 9, 2020
Hi, want to use the example given here but it is not working.
tensorboard.version
'2.2.1'
tensorflow.version
'2.2.0-rc3'
from tensorboard.summary import pr_curve_raw_data_pb
I get the following error: