Skip to content

[RFC]: Logits processor extensibility #17799

@afeldman-nm

Description

@afeldman-nm

Motivation.

Users want logits processor extensibility, i.e. the ability to specify logits processors beyond those such as min-p which are hard-coded into the engine. See for example:

The purpose of this RFC is to establish the interface for extending the vLLM V1 engine with additional logits processors during engine instantiation.

vLLM V0 supports logits processor configuration at request level (SamplingParams attribute). For V0 running in server mode, PR #11150 makes it possible for a request to dynamically import one or more logits processor modules, assuming that the necessary modules are available/installed. The logits_processors argument (available in the completion, chat completion and transcription API endpoints) allows the custom logits processors’ constructors to be specified as a list of (1) qualified names, or (2) LogitsProcessorConstructor data structures (which include the qualified name along with constructor arguments). For security purposes (prevention of arbitrary code execution), ​​--logits-processor-pattern whitelists specific logits processor libraries via regex.

We expect V1 will add logits processor support, with logits processors instantiated at server init time. See RFC #13360 , PR #16728 . (Note that, although the logits processors are instantiated at server init time, the behavior of the logits processors - including but not limited to enabling a given logits processor - can still be controlled using SamplingParams.extra_args on a per-request basis. #16862 allows SamplingParams.extra_args to be configured via via the vllm_xargs REST API argument.) #16728 adds a logits processor base class and migrates several hard-coded logits processors (min P, min tokens, logits bias) to be sub-classes of this base class. However, #16728 does not make the list of logit processors in a given engine instance extensible beyond the builtins - thus, this RFC focused on the need to implement extensibility as a follow-on task.

Support for logits processor extensibility in v1 is desirable, both for server mode and also for direct instantiation of LLM and AsyncLLM in Python.

Proposed Change.

Interface

  • For the purpose of this workstream - which is solely considering vLLM v1 engine - a "logits processor" is a subclass of the LogitsProcessor class (as defined in vllm/v1/sample/logits_processor.py in https://github.com/vllm-project/vllm/pull/16728/files ). Note that at time of writing, third-party libraries such as logits-processor-zoo are not directly compatible with this programming model.

  • In server mode, vLLM V1 engine will have a new CLI argument, logits-processors, which passes in a list of logits processor constructors. It will be necessary to find a clean way to represent this on the command line; here I propose (1) logits-processors expects a string representation of a JSON-formatted list, in which each element is one of the following: (1) qualified name of a logits processor, or (2) a "constructor" JSON object with a key/value pair for the qualified name of the logits processor as well as optional positional args (args) and keyword args (kwargs) arguments; for reference see [Frontend] Add logits_processors as an extra completion argument #11150 (comment)

    • Each logit processor module specified via command line will be imported
    • Each imported logits processor will be instantiated in the persistent batch; see InputBatch implementation in [V1] LogitsProcessor programming model #16728
    • Note that there will be no need for a logits-processor-pattern CLI argument (unlike in V0) because there is no need to whitelist specific module names, since the user is specifying logits processor names explicitly via CLI
    • Example:
# CLI logits processor example
vllm serve ... --logits-processors '['logits_processor_zoo.vllm.GenLengthLogitsProcessor',{'qualname': 'vllm.v1.sample.MinPLogitsProcessor','args':[0,'argument_value'],'kwargs':{'arg_name': 'arg_value'}}]'
  • LLM and AsyncLLM engine will support a logits_processors constructor argument. The argument still accepts a list of logits processors specifications, however unlike the CLI interface, each list element may be one of (1) a logits processor subclass, (2) the qualified name of a logits processor subclass, or (3) an instance of vllm.entrypoints.openai.protocol.LogitsProcessorConstructor (the underlying Python class for the aforementioned constructor JSON objects in the CLI interface):
class LogitsProcessorConstructor(BaseModel):
    qualname: str
    args: Optional[List[Any]] = None
    kwargs: Optional[Dict[str, Any]] = None
  • Users instantiating the engine directly in Python are responsible for security concerns as regards logits processors (i.e. validating that they are not importing third-party logits processor libraries in an unsafe way).
  • Examples:
from vllm.entrypoints.openai.protocol import LogitsProcessorConstructor
from logits_processor_zoo.vllm import CiteFromPromptLogitsProcessor
...

logitprocs_list = [CiteFromPromptLogitsProcessor,
                   'logits_processor_zoo.vllm.GenLengthLogitsProcessor',
                   LogitsProcessorConstructor(qualname='vllm.v1.sample.MinPLogitsProcessor',
                                                             args=[0,'argument_value'],
                                                             kwargs={'arg_name': 'arg_value'})]

# Sync engine example
llm = LLM(model="facebook/opt-125m",
          logits_processors=logitprocs_list)

# Async engine example
async_llm = AsyncLLM(model="facebook/opt-125m",
                     logits_processors=logitprocs_list)
  • In server mode, the logits processors specified by the logits-processors CLI argument, will be passed to the logits_processors argument of the engine constructor

  • For V0 back-compatibility, V0 continues supporting logits_processors request argument in REST API until V0 is removed

  • The vLLM V1 engine raises an invalid exception when V0 logits processor interfaces (SamplingParams logits_processors, REST API logits_processors) are utilized (it probably makes sense to implement this check in protocol.py even though protocol.py is not v1-specific, because we want skip importing logits processor modules if the user is mistakenly using the v0 logits processor interface with the v1 engine.)

Implementation

  1. Logits processors are passed in through the server CLI interface or through the LLM/AsyncLLM constructors.
  2. For logits processors which were specified by qualified names, the qualified names are resolved during engine initialization. The end result is a list of logits processor classes
  3. The list of logits processors is passed to the InputBatch constructor, which instantiates each logits processor

Feedback Period.

2 weeks

CC List.

@njhill @russellb @simon-mo @WoosukKwon

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions