-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Description
Motivation.
Users want logits processor extensibility, i.e. the ability to specify logits processors beyond those such as min-p which are hard-coded into the engine. See for example:
- [Bug]: V1 engine ignores logits processors and min-p sampling #12678
- https://github.com/NVIDIA/logits-processor-zoo - library of logits processor extensions
The purpose of this RFC is to establish the interface for extending the vLLM V1 engine with additional logits processors during engine instantiation.
vLLM V0 supports logits processor configuration at request level (SamplingParams
attribute). For V0 running in server mode, PR #11150 makes it possible for a request to dynamically import one or more logits processor modules, assuming that the necessary modules are available/installed. The logits_processors
argument (available in the completion, chat completion and transcription API endpoints) allows the custom logits processors’ constructors to be specified as a list of (1) qualified names, or (2) LogitsProcessorConstructor
data structures (which include the qualified name along with constructor arguments). For security purposes (prevention of arbitrary code execution), --logits-processor-pattern
whitelists specific logits processor libraries via regex.
We expect V1 will add logits processor support, with logits processors instantiated at server init time. See RFC #13360 , PR #16728 . (Note that, although the logits processors are instantiated at server init time, the behavior of the logits processors - including but not limited to enabling a given logits processor - can still be controlled using SamplingParams.extra_args
on a per-request basis. #16862 allows SamplingParams.extra_args
to be configured via via the vllm_xargs
REST API argument.) #16728 adds a logits processor base class and migrates several hard-coded logits processors (min P, min tokens, logits bias) to be sub-classes of this base class. However, #16728 does not make the list of logit processors in a given engine instance extensible beyond the builtins - thus, this RFC focused on the need to implement extensibility as a follow-on task.
Support for logits processor extensibility in v1 is desirable, both for server mode and also for direct instantiation of LLM
and AsyncLLM
in Python.
Proposed Change.
Interface
-
For the purpose of this workstream - which is solely considering vLLM v1 engine - a "logits processor" is a subclass of the
LogitsProcessor
class (as defined invllm/v1/sample/logits_processor.py
in https://github.com/vllm-project/vllm/pull/16728/files ). Note that at time of writing, third-party libraries such aslogits-processor-zoo
are not directly compatible with this programming model. -
In server mode, vLLM V1 engine will have a new CLI argument,
logits-processors
, which passes in a list of logits processor constructors. It will be necessary to find a clean way to represent this on the command line; here I propose (1)logits-processors
expects a string representation of a JSON-formatted list, in which each element is one of the following: (1) qualified name of a logits processor, or (2) a "constructor" JSON object with a key/value pair for the qualified name of the logits processor as well as optional positional args (args
) and keyword args (kwargs
) arguments; for reference see [Frontend] Addlogits_processors
as an extra completion argument #11150 (comment)- Each logit processor module specified via command line will be imported
- Each imported logits processor will be instantiated in the persistent batch; see
InputBatch
implementation in [V1] LogitsProcessor programming model #16728 - Note that there will be no need for a
logits-processor-pattern
CLI argument (unlike in V0) because there is no need to whitelist specific module names, since the user is specifying logits processor names explicitly via CLI - Example:
# CLI logits processor example
vllm serve ... --logits-processors '['logits_processor_zoo.vllm.GenLengthLogitsProcessor',{'qualname': 'vllm.v1.sample.MinPLogitsProcessor','args':[0,'argument_value'],'kwargs':{'arg_name': 'arg_value'}}]'
LLM
andAsyncLLM
engine will support alogits_processors
constructor argument. The argument still accepts a list of logits processors specifications, however unlike the CLI interface, each list element may be one of (1) a logits processor subclass, (2) the qualified name of a logits processor subclass, or (3) an instance ofvllm.entrypoints.openai.protocol.LogitsProcessorConstructor
(the underlying Python class for the aforementioned constructor JSON objects in the CLI interface):
class LogitsProcessorConstructor(BaseModel):
qualname: str
args: Optional[List[Any]] = None
kwargs: Optional[Dict[str, Any]] = None
- Users instantiating the engine directly in Python are responsible for security concerns as regards logits processors (i.e. validating that they are not importing third-party logits processor libraries in an unsafe way).
- Examples:
from vllm.entrypoints.openai.protocol import LogitsProcessorConstructor
from logits_processor_zoo.vllm import CiteFromPromptLogitsProcessor
...
logitprocs_list = [CiteFromPromptLogitsProcessor,
'logits_processor_zoo.vllm.GenLengthLogitsProcessor',
LogitsProcessorConstructor(qualname='vllm.v1.sample.MinPLogitsProcessor',
args=[0,'argument_value'],
kwargs={'arg_name': 'arg_value'})]
# Sync engine example
llm = LLM(model="facebook/opt-125m",
logits_processors=logitprocs_list)
# Async engine example
async_llm = AsyncLLM(model="facebook/opt-125m",
logits_processors=logitprocs_list)
-
In server mode, the logits processors specified by the
logits-processors
CLI argument, will be passed to thelogits_processors
argument of the engine constructor -
For V0 back-compatibility, V0 continues supporting logits_processors request argument in REST API until V0 is removed
-
The vLLM V1 engine raises an invalid exception when V0 logits processor interfaces (SamplingParams logits_processors, REST API logits_processors) are utilized (it probably makes sense to implement this check in protocol.py even though protocol.py is not v1-specific, because we want skip importing logits processor modules if the user is mistakenly using the v0 logits processor interface with the v1 engine.)
Implementation
- Logits processors are passed in through the server CLI interface or through the
LLM
/AsyncLLM
constructors. - For logits processors which were specified by qualified names, the qualified names are resolved during engine initialization. The end result is a list of logits processor classes
- The list of logits processors is passed to the
InputBatch
constructor, which instantiates each logits processor
Feedback Period.
2 weeks
CC List.
@njhill @russellb @simon-mo @WoosukKwon
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.