NaN error when using a GPU with no support for igemmlt

I get `RuntimeError: probability tensor contains either inf, nan or element < 0` on most language models when trying to run them in 8bit.

I adapted a script made by lorr1 https://github.com/TimDettmers/bitsandbytes/issues/42#issue-1384920163 into a [small script](https://gist.github.com/0cc4m/a753b6a16a618cfbe747a74920dc50f6) that first runs the model using 8bit with igemmlt and then disables the support for igemmlt and runs it again. I tested this on an RTX 3060 and the result is the RuntimeError when running without `igemmlt`. I think there is a bug in the code that replaces `igemmlt` on older GPUs.

Interestingly, it works on some models, like `EleutherAI/pythia-70m-deduped`, `EleutherAI/gpt-neo-125M`, `facebook/opt-6.7b`, but on most others it fails with the RuntimeError. When run with `EleutherAI/pythia-410m-deduped` it outputs the following:

```
» python 8bit_test.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
8bit-reg:
Q: On average Joe throws 25 punches per minute. A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?

A: Let’s think step by step.

First, Joe threw a baseball cap.
Next, he threw a bat in the air.
Joe threw a bat in the air.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Traceback (most recent call last):
  File "/media/veryhighspeed/koboldai/client/8bit_test.py", line 57, in <module>
    generated_ids_8bit = model_8bit.generate(input_ids, max_length=len(input_ids[0]) + MAX_NEW_TOKENS, do_sample=True)
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate
    return self.sample(
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2479, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
```

@Ph0rk0z in https://github.com/TimDettmers/bitsandbytes/issues/131#issuecomment-1418274961 also ran into this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NaN error when using a GPU with no support for igemmlt #165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NaN error when using a GPU with no support for igemmlt #165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions