Skip to content

GradientAccumulator wrapper not working as expected #2

@andreped

Description

@andreped

In gradient accumulation, we try to update the weights only after a given number of iterations (k number of batches), in an ensemble-based manner. For instance, by averaging across the gradients calculated for k batches, and only updated the weights then - simulating regular batch training.

After running the benchmark described here, using:

  1. batch_size=32, accum_steps=1, epochs=3
  2. batch_size=8, accum_steps=4, epochs=12

We do not get the same results. It seems like the weights are updated for every batch even though we use accum_steps > 4.

Both the original wrapper implementation GradientAccumulator and the Adam-based wrapper AdamAccumulate suffer from this.

Are we actually able to control when the weights are updated from the optimizer, or can we only calculate and get the gradients and enforce and update ourselves?

Obviously we can make our own train loop, but the whole point is to have a simple wrapper class which handles all this for us.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions