Skip to content

NaN error when using a GPU with no support for igemmlt #165

Closed
@0cc4m

Description

@0cc4m

I get RuntimeError: probability tensor contains either inf, nan or element < 0 on most language models when trying to run them in 8bit.

I adapted a script made by lorr1 #42 (comment) into a small script that first runs the model using 8bit with igemmlt and then disables the support for igemmlt and runs it again. I tested this on an RTX 3060 and the result is the RuntimeError when running without igemmlt. I think there is a bug in the code that replaces igemmlt on older GPUs.

Interestingly, it works on some models, like EleutherAI/pythia-70m-deduped, EleutherAI/gpt-neo-125M, facebook/opt-6.7b, but on most others it fails with the RuntimeError. When run with EleutherAI/pythia-410m-deduped it outputs the following:

» python 8bit_test.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
8bit-reg:
Q: On average Joe throws 25 punches per minute. A fight lasts 5 rounds of 3 minutes.
How many punches did he throw?

A: Let’s think step by step.

First, Joe threw a baseball cap.
Next, he threw a bat in the air.
Joe threw a bat in the air.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Traceback (most recent call last):
  File "/media/veryhighspeed/koboldai/client/8bit_test.py", line 57, in <module>
    generated_ids_8bit = model_8bit.generate(input_ids, max_length=len(input_ids[0]) + MAX_NEW_TOKENS, do_sample=True)
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1437, in generate
    return self.sample(
  File "/media/veryhighspeed/koboldai/client/8bit-venv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2479, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

@Ph0rk0z in #131 (comment) also ran into this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions