Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jan 18, 2024

This PR improves HellaSwag computation via the perplexity tool by batching both the endings and the tasks into a single llama_batch

For GPUs with plenty of FLOPS, adding -c 1024 or even -c 2048 might further improve performance

By default we evaluate 1 task at a time, but for small tasks it is useful to batch them together. This can be controlled with the --parallel argument.

@ggerganov ggerganov merged commit ad19812 into master Jan 18, 2024
@ggerganov ggerganov deleted the gg/hellaswag-batched branch January 18, 2024 13:33
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
* perplexity : faster HellaSwag

ggml-ci

* perplexity : clean-up

ggml-ci

* perplexity : no need for decode_helper

ggml-ci

* perplexity : add comments

* perplexity : option to specify max batched tasks via `n_parallel`

* perplexity : remove HellaSwag restruction for n_batch
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* perplexity : faster HellaSwag

ggml-ci

* perplexity : clean-up

ggml-ci

* perplexity : no need for decode_helper

ggml-ci

* perplexity : add comments

* perplexity : option to specify max batched tasks via `n_parallel`

* perplexity : remove HellaSwag restruction for n_batch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant