Description
#564 🐛 Bug
I seem to be getting the following error each time I try to train with APEX/fp16 with BERT finetuning. It happened with my own scripts and I also see this with repository's standard finetune_on_pregenerated.py
which was recently updated. The error diagnostics seem to indicate an issue with the FusedLayerNorm
. To further confirm: doing a local mod where I replaced the definition of BertLayerNorm with
BertLayerNorm = torch.nn.LayerNorm
The change resolves this issue (while, in my case, not noticeably changing the performance).. Apex docs are a bit raw but the most recent set does not suggest to manually manipulate optimizers or layer definitions, perhaps we should just stick to the BertLayerNorm definition as described above?
Traceback (most recent call last):
File "ash3/tune_bert.py", line 101, in <module>
main(sys.argv[1:])
File "ash3/tune_bert.py", line 47, in main
pregenerate(init)
File "ash3/tune_bert.py", line 85, in pregenerate
finetune_on_pregenerated(tune_args)
File "/home/madvillain/gitlab/ai/ash3/ash3/finetuning/finetune_on_pregenerated.py", line 292, in main
outputs = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 785, in forward
prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 533, in forward
prediction_scores = self.predictions(sequence_output)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 501, in forward
hidden_states = self.transform(hidden_states)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 483, in forward
hidden_states = self.LayerNorm(hidden_states)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 159, in forward
input, self.weight, self.bias, self.normalized_shape,self.eps)
File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 25, in forward
input_, ctx.normalized_shape, weight_, bias_, ctx.eps)
RuntimeError: expected scalar type Half but found Float (data<c10::Half> at /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f6af587edc5 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Half* at::Tensor::data<c10::Half>() const + 0x2c6 (0x7f6abeb8aa36 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: cuda_layer_norm(at::Tensor*, at::Tensor*, at::Tensor*, at::Tensor*, int, int, c10::ArrayRef<long>, at::Tensor*, at::Tensor*, double) + 0x3ed (0x7f6abeb87dcd in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x27a (0x7f6abeb7985a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x196c4 (0x7f6abeb866c4 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x16e0a (0x7f6abeb83e0a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_apply(_object*, _object*) + 0x691 (0x7f6b24b0a081 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Model I am using (Bert, XLNet....): BERT
Language I am using the model on (English, Chinese....): English
The problem arise when using:
- [* ] the official example scripts: (give details)
- my own modified scripts: (give details)
The tasks I am working on is:
- [* ] an official GLUE/SQUaD task: (give the name) finetune_on_pregenerated.py
- my own task or dataset: (give details)
Expected behavior
no failures
Environment
- OS: Ubuntu 18.04
- Python version: 3.6
- PyTorch version: 1.1.0, 1.2.0
- PyTorch Transformers version (or branch): 1.1.0
- Using GPU ? yes
- Distributed of parallel setup ? no
- Any other relevant information: cudatoolkit 10.0, APEX git hash code: 53eae1986320d016ee7b347d78839dd5e96e7e93