Skip to content

[Question] How does lightllm implement nopad batching? #405

@Tomorrowdawn

Description

@Tomorrowdawn

Thanks for your great work! Here are my concerns:

Say we get a batch of inputs with lengths L1,L2,... How to simultaneously compute the attention scores of these inputs by 'nopad'? That sounds amazing but I failed to figure why when reading source code.

Additionally, in the decoding phase, how do you handle different kv length?(the code suggests kv cache is of a well-formed shape [B, num heads,...], which is confusing, because different prefixes result in different length of kv cache).

I want to implement batched speculative decoding and those details are important.

Thanks. Any detail, code or pseudo code are appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions