You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
float8: remove unneeded kernel for scale generation (#616)
Summary:
The code to create a float8 scale is unnecessarily creating an extra GPU
kernel launch by calling `torch.empty`, removing this.
There is no performance impact, but it does make things easier to debug by reducing log size / making GPU traces simpler.
Test Plan:
```
// extract trace of a linear fwd+bwd with
python benchmarks/float8/profile_linear_float8.py ~/local/tmp/test
// verify that the GPU kernel creating an empty scale tensor is no longer there
// unit tests pass
./test/float8/test_everything.sh
```
Reviewers:
Subscribers:
Tasks:
Tags:
0 commit comments