Skip to content

Inconsistent Output between LightLLM and Transformers Inference Library #309

@Lvjinhong

Description

@Lvjinhong

When specifying 'max new tokens', LightLLM's output consistently matches this maximum value. However, Transformers sometimes adjust according to the model itself, resulting in outputs shorter than the specified 'max new tokens'. I believe Transformers is correct in this approach. It's implausible to always generate output exactly matching the maximum 'max new tokens' value, as this would only lead to repetitive outputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions