Skip to content

apex fp16 FusedLayerNorm type issues #1172

@mksenzov

Description

@mksenzov

#564 🐛 Bug

I seem to be getting the following error each time I try to train with APEX/fp16 with BERT finetuning. It happened with my own scripts and I also see this with repository's standard finetune_on_pregenerated.py which was recently updated. The error diagnostics seem to indicate an issue with the FusedLayerNorm. To further confirm: doing a local mod where I replaced the definition of BertLayerNorm with

BertLayerNorm = torch.nn.LayerNorm

The change resolves this issue (while, in my case, not noticeably changing the performance).. Apex docs are a bit raw but the most recent set does not suggest to manually manipulate optimizers or layer definitions, perhaps we should just stick to the BertLayerNorm definition as described above?

Traceback (most recent call last):
  File "ash3/tune_bert.py", line 101, in <module>
    main(sys.argv[1:])
  File "ash3/tune_bert.py", line 47, in main
    pregenerate(init)
  File "ash3/tune_bert.py", line 85, in pregenerate
    finetune_on_pregenerated(tune_args)
  File "/home/madvillain/gitlab/ai/ash3/ash3/finetuning/finetune_on_pregenerated.py", line 292, in main
    outputs = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 785, in forward
    prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 533, in forward
    prediction_scores = self.predictions(sequence_output)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 501, in forward
    hidden_states = self.transform(hidden_states)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/pytorch_transformers/modeling_bert.py", line 483, in forward
    hidden_states = self.LayerNorm(hidden_states)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 159, in forward
    input, self.weight, self.bias, self.normalized_shape,self.eps)
  File "/home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 25, in forward
    input_, ctx.normalized_shape, weight_, bias_, ctx.eps)
RuntimeError: expected scalar type Half but found Float (data<c10::Half> at /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/include/ATen/core/TensorMethods.h:1386)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f6af587edc5 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Half* at::Tensor::data<c10::Half>() const + 0x2c6 (0x7f6abeb8aa36 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: cuda_layer_norm(at::Tensor*, at::Tensor*, at::Tensor*, at::Tensor*, int, int, c10::ArrayRef<long>, at::Tensor*, at::Tensor*, double) + 0x3ed (0x7f6abeb87dcd in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x27a (0x7f6abeb7985a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x196c4 (0x7f6abeb866c4 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x16e0a (0x7f6abeb83e0a in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_apply(_object*, _object*) + 0x691 (0x7f6b24b0a081 in /home/madvillain/miniconda3/envs/ash3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

Model I am using (Bert, XLNet....): BERT

Language I am using the model on (English, Chinese....): English

The problem arise when using:

  • [* ] the official example scripts: (give details)
  • my own modified scripts: (give details)

The tasks I am working on is:

  • [* ] an official GLUE/SQUaD task: (give the name) finetune_on_pregenerated.py
  • my own task or dataset: (give details)

Expected behavior

no failures

Environment

  • OS: Ubuntu 18.04
  • Python version: 3.6
  • PyTorch version: 1.1.0, 1.2.0
  • PyTorch Transformers version (or branch): 1.1.0
  • Using GPU ? yes
  • Distributed of parallel setup ? no
  • Any other relevant information: cudatoolkit 10.0, APEX git hash code: 53eae1986320d016ee7b347d78839dd5e96e7e93

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions