GradientAccumulator wrapper not working as expected

In gradient accumulation, we try to update the weights only after a given number of iterations (k number of batches), in an ensemble-based manner. For instance, by averaging across the gradients calculated for k batches, and only updated the weights then - simulating regular batch training.

After running the benchmark described [here](https://github.com/andreped/GradientAccumulator/blob/main/benchmark.py), using:
1) batch_size=32, accum_steps=1, epochs=3
2) batch_size=8, accum_steps=4, epochs=12

We do not get the same results. It seems like the weights are updated for every batch even though we use accum_steps > 4.

Both the original wrapper implementation [GradientAccumulator](https://github.com/andreped/GradientAccumulator/blob/main/GradientAccumulator/accumulator.py) and the Adam-based wrapper [AdamAccumulate](https://github.com/andreped/GradientAccumulator/blob/main/GradientAccumulator/adamAccumulate.py) suffer from this.

Are we actually able to control when the weights are updated from the optimizer, or can we only calculate and get the gradients and enforce and update ourselves?

Obviously we can make our own train loop, but the whole point is to have a simple wrapper class which handles all this for us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GradientAccumulator wrapper not working as expected #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GradientAccumulator wrapper not working as expected #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions