Open
Description
TL;DR
operation converters in dynamo to support full compilation for GPT2
Goal(s)
Run GPT2 on multi-gpu with only 1 TensorRT engine.
Tasks
### Tasks
- [ ] https://github.com/pytorch/TensorRT/issues/2544
- [ ] https://github.com/pytorch/TensorRT/issues/2434
- [ ] https://github.com/pytorch/TensorRT/pull/2654
- [ ] torch.ops._c10d_functional.all_reduce.default
- [ ] torch.ops._c10d_functional.wait_tensor.default