[Question] How does lightllm implement nopad batching?

Thanks for your great work! Here are my concerns:

Say we get a batch of inputs with lengths L1,L2,... How to simultaneously compute the attention scores of these inputs by 'nopad'? That sounds amazing but I failed to figure why when reading source code.

Additionally, in the decoding phase, how do you handle different kv length?(the code suggests kv cache is of a well-formed shape [B, num heads,...], which is confusing, because different prefixes result in different length of kv cache).

I want to implement batched speculative decoding and those details are important.

Thanks. Any detail, code or pseudo code are appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] How does lightllm implement nopad batching? #405

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How does lightllm implement nopad batching? #405

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions