server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` #6254

phymbert · 2024-03-23T10:01:47Z

Update server/README.md with actual params supported.

Closes server: comment --threads option behavior #6230

…sable`

ngxson · 2024-03-23T13:03:08Z

examples/server/README.md

 - `-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance. Requires cuBLAS.
- `-b N`, `--batch-size N`: Set the batch size for prompt processing. Default: `512`.
+- `-b N`, `--batch-size N`: Set the batch size for prompt processing. Default: `2048`.
+- `-ub N`, `--ubatch-size N`: physical maximum batch size. Default: `512`.


Should we also make it clear that ubatch should be enough when using for embeddings?

@slaren Could you please confirm the good combination between --ubatch-size and --batch-size for a bert model / --embedding ?

For embeddings, --ubatch-size must be greater than bert.context_length and --batch-size equals to --ubatch-size

Yes, there is no advantage to increasing n_batch above n_ubatch with embeddings models with pooling, because the entire batch must fit in a physical batch (ie. n_ubatch). n_batch is always >= n_ubatch.

@ngxson I think it is better to add this check this in server.cpp. i will create an issue to trace it, we will implement it later on.

The embeddings from multiple slots can go in a single batch. For example with n_batch = 2048 and n_ubatch = 512 we can process 4 full slots in one go

It could be implemented, but the batch splitting code does not take this into account. llama_decode will just fail if n_tokens > n_ubatch.

We can move this discussion in ?

server: exit failure if --embedding is set with an incoherent --ubatch-size #6263

Ah yes sorry, ignore my comment

…sable` (ggml-org#6254)

server: docs: --threads and --threads, --ubatch-size, `--log-di…

c534980

…sable`

phymbert requested a review from ggerganov March 23, 2024 10:01

phymbert mentioned this pull request Mar 23, 2024

server: bench: continuous performance testing #6233

Closed

16 tasks

phymbert requested a review from ngxson March 23, 2024 11:59

ngxson reviewed Mar 23, 2024

View reviewed changes

ggerganov approved these changes Mar 23, 2024

View reviewed changes

phymbert merged commit 1997577 into master Mar 23, 2024

phymbert mentioned this pull request Mar 23, 2024

server: exit failure if --embedding is set with an incoherent --ubatch-size #6263

Open

phymbert deleted the hp/server/doc branch March 23, 2024 17:15

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

server: docs: --threads and --threads, --ubatch-size, `--log-di…

b2c8795

…sable` (ggml-org#6254)

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024

server: docs: --threads and --threads, --ubatch-size, `--log-di…

b3c05e0

…sable` (ggml-org#6254)

tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024

server: docs: --threads and --threads, --ubatch-size, `--log-di…

44816bc

…sable` (ggml-org#6254)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` #6254

server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` #6254

Uh oh!

phymbert commented Mar 23, 2024 •

edited

Loading

Uh oh!

ngxson Mar 23, 2024

Uh oh!

phymbert Mar 23, 2024

Uh oh!

slaren Mar 23, 2024

Uh oh!

phymbert Mar 23, 2024

Uh oh!

ggerganov Mar 23, 2024 •

edited

Loading

Uh oh!

slaren Mar 23, 2024

Uh oh!

phymbert Mar 23, 2024

Uh oh!

ggerganov Mar 23, 2024

Uh oh!

Uh oh!

server: docs: --threads and --threads, --ubatch-size, --log-disable #6254

server: docs: --threads and --threads, --ubatch-size, --log-disable #6254

Uh oh!

Conversation

phymbert commented Mar 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

phymbert Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

slaren Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

phymbert Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov Mar 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

phymbert Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

ggerganov Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` #6254

server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` #6254

phymbert commented Mar 23, 2024 •

edited

Loading

ggerganov Mar 23, 2024 •

edited

Loading