eneration: meta-safe _prepare_special_tokens + regression tests #40900
+1,040
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
What / Why
This PR makes
generation/utils.py::_prepare_special_tokens
meta-safe.In assisted decoding, special-token tensors could be created on the
meta
device and then accessed via.item()
or.cpu().numpy()
, which triggers:RuntimeError: Tensor.item() cannot be called on meta tensors
This patch avoids unsafe operations by rebuilding safe tensors on the requested device using Python IDs from
GenerationConfig
, and introduces a clear error type for unsupported cases.MetaSafeTensorError
for explicit failures instead of opaque framework errors.Scope
src/transformers/generation/utils.py
_prepare_special_tokens
to be meta-safe..item()
or.cpu().numpy()
on meta).MetaSafeTensorError
(subclass ofRuntimeError
) for unsupported meta ops.tests/test_generation_meta.py
Details of the Fix
meta
are not moved or read directly..item()
/.cpu().numpy()
are never called on meta tensors.MetaSafeTensorError
with a descriptive message.Regression Tests
New tests in
tests/test_generation_meta.py
:test_prepare_special_tokens_cpu
– CPU tensors work as before.test_prepare_special_tokens_meta
– Meta tensors no longer raise; function completes.test_prepare_special_tokens_consistency
– Outputs match between CPU and meta paths.test_no_drift_after_prepare
– ConfirmsGenerationConfig
is not mutated.✅ All tests pass locally and in CI (
ubuntu-latest
, Python 3.10 & 3.12).Related
num_assistant_tokens
and assisted decoding flows).Backward Compatibility
.item()
from meta tensors.Performance
Validation
Local
pytest -q tests/test_generation_meta.py # PASS
CI (GitHub Actions, ubuntu-latest, Py3.10/3.12)
Full test suite including new meta safety tests → PASS
Concurrency probes
Assisted decoding succeeds with no config drift.
Checklist
Notes for Reviewers