Skip to content

Conversation

moonrunnerkc
Copy link

@moonrunnerkc moonrunnerkc commented Sep 16, 2025

Summary

What / Why

This PR makes generation/utils.py::_prepare_special_tokens meta-safe.

In assisted decoding, special-token tensors could be created on the meta device and then accessed via .item() or .cpu().numpy(), which triggers:

RuntimeError: Tensor.item() cannot be called on meta tensors

This patch avoids unsafe operations by rebuilding safe tensors on the requested device using Python IDs from GenerationConfig, and introduces a clear error type for unsupported cases.

  • ✅ Adds MetaSafeTensorError for explicit failures instead of opaque framework errors.
  • ✅ Hardens special-token setup so assisted decoding succeeds under concurrency and in meta-aware pipelines.

Scope

src/transformers/generation/utils.py

  • Patch _prepare_special_tokens to be meta-safe.
  • Fix internal helper for ID → tensor conversion (no .item() or .cpu().numpy() on meta).
  • Add MetaSafeTensorError (subclass of RuntimeError) for unsupported meta ops.

tests/test_generation_meta.py

  • Add regression tests covering CPU path, meta path, output consistency, and no config drift.

Note: No changes to public APIs. No behavioral change for non-meta paths.


Details of the Fix

  • Special token IDs provided as tensors on meta are not moved or read directly.
  • Instead, fresh scalar tensors are reconstructed on the requested device using the underlying Python IDs from config.
  • .item() / .cpu().numpy() are never called on meta tensors.
  • If a non-scalar meta tensor is encountered without a safe conversion path, we raise MetaSafeTensorError with a descriptive message.

Regression Tests

New tests in tests/test_generation_meta.py:

  • test_prepare_special_tokens_cpu – CPU tensors work as before.
  • test_prepare_special_tokens_meta – Meta tensors no longer raise; function completes.
  • test_prepare_special_tokens_consistency – Outputs match between CPU and meta paths.
  • test_no_drift_after_prepare – Confirms GenerationConfig is not mutated.

✅ All tests pass locally and in CI (ubuntu-latest, Python 3.10 & 3.12).


Related


Backward Compatibility

  • No user-visible change for non-meta execution.
  • Meta-aware execution paths are now robust: assisted decoding no longer crashes on .item() from meta tensors.

Performance

  • Negligible overhead — only touches scalar special-token handling during generation setup.
  • No extra allocations beyond tiny scalar tensors when needed.

Validation

  • Local

    pytest -q tests/test_generation_meta.py # PASS

  • CI (GitHub Actions, ubuntu-latest, Py3.10/3.12)

    Full test suite including new meta safety tests → PASS

  • Concurrency probes

Assisted decoding succeeds with no config drift.

Checklist

  • Existing tests pass
  • New tests added
  • Ran make fixup (format/quality) locally
  • No API changes / docs not required
  • Minimal, well-scoped patch with regression coverage

Notes for Reviewers

  • Change is intentionally minimal and defensive only where necessary.
  • MetaSafeTensorError makes failures explicit; happy to relocate to a shared errors module if preferred.
  • Can also add a doc comment in GenerationConfig noting that special token IDs may be passed as ints or tensors (including meta), and are normalized during generation.

@moonrunnerkc
Copy link
Author

The Check tiny models CI job is failing because there isn’t a compatible tokenizers build for Python 3.8 (>=0.22,<0.23). This is an environment issue in the CI setup, not related to the changes in this PR. All tests pass locally and in CI on Python 3.10 and 3.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants