Description
I'm currently tracking several discrete values, e.g. the number of iterations a loop has gone through, over several training steps and dumping them to a tf.summary.histogram
at the end of each epoch.
My current approach amounts to storing counts of the values seen in a counts = tf.Variable(shape=[max_counter_value], initializer='zeros', dtype=tf.int32, trainable=False)
and then constructing a "fake" tf.summary_histogram
, e.g.
data = tf.repeat(
tf.range(max_counter_value, dtype=tf.float32),
repeats=counts)
tf.summary.histogram(name, data, step=step, buckets=2 * max_counter_value - 1)
This is a bit clunky for several reasons:
- I need to inflate the
counts
into thedata
Tensor, which can contain several thousands/millions of values, even thoughtf.summary.histogram
probably re-computescounts
internally, - Plotting the data in TensorBoard smears the discrete values over several buckets, so I never actually see the total count,
- (Somewhat tangentially) I have to handle accumulating and flushing the counters myself.
What would be really super cool would be a function named something like tf.summary.counter(value: tf.Tensor, name: str, flush_every_n_epochs:int = 1)
where I can just dump in Tensors of integer types and get the discrete (unsmoothed) histograms every 'n' epochs.
I'm guessing the third part (accumulating values across steps) is probably a bit iffy since it would require maintaining some kind of state, but I'm hoping something like calling tf.summary.histogram
with a set of pre-bucketed counts should be possible?
Cheers, Pedro
Activity
wchargin commentedon Mar 6, 2021
Hi @gonnet! Thanks for reaching out (and sorry for the delayed response;
meant to post this message earlier). See #1015, #1803, et al. for prior
discussions about how the histogram summary recording and visualization
could be improved.
The request for “stateful histograms” (put in tensors across multiple
calls/steps, flush out a histogram on demand) is new, as far as I’m
aware, but there is some prior art in the TF 1.x PR curves summary.
Making that kind of thing TF2-compatible is not simple, but histograms
may be simpler than PR curves. I’ll keep this issue open for that
request. Can’t promise that we’ll implement it any time soon, but we’ll
keep it on the backlog.
nfelt commentedon Mar 15, 2021
I think this is basically #900. They aren't quite the same, since that issue asks to use the visualization with direct probabilities rather than a true "histogram" that represents a many observations, so the semantics are a bit different, but presumably implementation wise it more or less amounts to the same thing: a bar chart where you specify a set of values to plot and each one is associated with some range of the x-axis. (This is distinct from cases where each value is associated with actually just a single point on the x-axis, or with some categorical value, which is more like #2145.)
So may want to consider if we do provide the ability to pass pre-bucketed counts whether we want to stick with a fairly restrictive option (i.e. they have to be integer counts and should semantically represent counts, e.g. be non-negative) or allow a more general case (e.g. values can be floating points, or possibly even negative).
canbakiskan commentedon Jul 1, 2021
I tried to change histogram plots to look like bar graph plots: link
Also, about the second point, if you have a predetermined number of buckets for all of your plots, you can edit this line this line and recompile. Then the counts will not be smeared if you pass
buckets=<nb_bins>
totf.summary.histogram()
orbins=<nb_bins>
totorch.utils.tensorboard.writer.SummaryWriter().add_histogram()
. That being said, it's a stopgap measure.