llama : add example for speculative sampling

Speculative sampling is explained here: https://arxiv.org/abs/2302.01318

In more simple terms here:

- https://github.com/ggerganov/llama.cpp/issues/630#issuecomment-1518745593
- https://github.com/ggerganov/llama.cpp/issues/630#issuecomment-1556448281

For start, the "draft" model can be generated using the [train-text-from-scratch](https://github.com/ggerganov/llama.cpp/tree/master/examples/train-text-from-scratch) example using the same vocab as LLaMA. Later, we can try to utilize better models.

We also assume that batching multiple tokens with the "main" model is significantly faster compared to processing the tokens one-by-one. This may not yet be the case, but it will be when we close https://github.com/ggerganov/ggml/issues/293





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add example for speculative sampling #2030

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

llama : add example for speculative sampling #2030

Description

Activity

SlyEcho commented on Jun 29, 2023

ggerganov commented on Jul 1, 2023

evanmiller commented on Jul 5, 2023

DKormann commented on Jul 28, 2023

ggerganov commented on Aug 10, 2023

charliexchen commented on Aug 27, 2023

ggerganov commented on Aug 27, 2023

charliexchen commented on Aug 27, 2023

evanmiller commented on Aug 27, 2023

charliexchen commented on Aug 27, 2023

ggerganov commented on Aug 31, 2023

ggerganov commented on Sep 3, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions