Skip to content

Error:AutoTokenizer.from_pretrained,UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment #25848

@duweidongzju

Description

@duweidongzju

System Info

  • transformers version: 4.32.1
  • Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.8.17
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.2
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

my code

import torch
from transformers import AutoTokenizer

model_name_or_path = 'llama-2-7b-hf'
use_fast_tokenizer = False
padding_side = "left"
config_kwargs = {'trust_remote_code': True, 'cache_dir': None, 'revision': 'main', 'use_auth_token': None}
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=use_fast_tokenizer, padding_side=padding_side, **config_kwargs)

the error is

Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 727, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 156, in init
self.sp_model = self.get_spm_processor()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 164, in get_spm_processor
model_pb2 = import_protobuf()
File "/root/anaconda3/envs/llama_etuning/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 40, in import_protobuf
return sentencepiece_model_pb2
UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment

Expected behavior

what I need to do to solve the problem

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions