apex fp16 FusedLayerNorm type issues

#564  🐛 Bug

I seem to be getting the following error each time I try to train with APEX/fp16 with BERT finetuning. It happened with my own scripts and I also see this with repository's standard `finetune_on_pregenerated.py` which was recently updated. The error diagnostics seem to indicate an issue with the `FusedLayerNorm`. To further confirm: doing a local mod where I replaced the definition of BertLayerNorm with 

```BertLayerNorm = torch.nn.LayerNorm```

The change resolves this issue (while, in my case, not noticeably changing the performance).. Apex docs are a bit raw but the most recent set does not suggest to manually manipulate optimizers or layer definitions, perhaps we should just stick to the BertLayerNorm definition as described above?

```
Traceback (most recent call last):
  File "ash3/tune_bert.py", line 101, in <module>
    main(sys.argv[1:])
  File "ash3/tune_bert.py", line 47, in main
    pregenerate(init)
  File "ash3/tune_bert.py", line 85, in pregenerate
    finetune_on_pregenerated(tune_args)
  File "/home/madvillain/gitlab/ai/ash3/ash3/finetuning/finetune_on_pregenerated.py", line 292, in main
    outputs = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 785, in forward
    prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 533, in forward
    prediction_scores = self.predictions(sequence_output)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 501, in forward
    hidden_states = self.transform(hidden_states)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 483, in forward
    hidden_states = self.LayerNorm(hidden_states)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 159, in forward
    input, self.weight, self.bias, self.normalized_shape,self.eps)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 25, in forward
    input_, ctx.normalized_shape, weight_, bias_, ctx.eps)
RuntimeError: expected scalar type Half but found Float (data<c10::Half> at /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f6af587edc5 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Half* at::Tensor::data<c10::Half>() const + 0x2c6 (0x7f6abeb8aa36 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: cuda_layer_norm(at::Tensor*, at::Tensor*, at::Tensor*, at::Tensor*, int, int, c10::ArrayRef<long>, at::Tensor*, at::Tensor*, double) + 0x3ed (0x7f6abeb87dcd in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x27a (0x7f6abeb7985a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x196c4 (0x7f6abeb866c4 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x16e0a (0x7f6abeb83e0a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_apply(_object*, _object*) + 0x691 (0x7f6b24b0a081 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
```


Model I am using (Bert, XLNet....): BERT

Language I am using the model on (English, Chinese....): English

The problem arise when using:
* [* ] the official example scripts: (give details)
* [ ] my own modified scripts: (give details)

The tasks I am working on is:
* [* ] an official GLUE/SQUaD task: (give the name) finetune_on_pregenerated.py
* [ ] my own task or dataset: (give details)

## Expected behavior

no failures

## Environment

* OS: Ubuntu 18.04
* Python version: 3.6
* PyTorch version: 1.1.0, 1.2.0
* PyTorch Transformers version (or branch): 1.1.0
* Using GPU ? yes
* Distributed of parallel setup ? no
* Any other relevant information: cudatoolkit 10.0, APEX git hash code: 53eae1986320d016ee7b347d78839dd5e96e7e93


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

apex fp16 FusedLayerNorm type issues #1172

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

apex fp16 FusedLayerNorm type issues #1172

Description

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions