Releases · l3utterfly/llama.cpp

30 Jul 06:56

a118d80

b6029 Latest

Latest

embeddings: fix extraction of CLS pooling results (#14927)

* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-07-30T06:56:48Z
llama-b6029-bin-macos-arm64.zip

sha256:cfa6aef8fd8822cb3fe0009455e932f02055b6801b28795f76a694ccbcad726c

10.7 MB 2025-07-30T06:57:03Z
llama-b6029-bin-macos-x64.zip

sha256:690ccc749350604b5b979d01f6b608a401e2e6f5655d3416c0a49d3f20b6b62e

27.2 MB 2025-07-30T06:57:04Z
llama-b6029-bin-ubuntu-vulkan-x64.zip

sha256:6eef4c46adacf285032b74b3a9228f36f75e06fc50967888a97434de006bc051

20.9 MB 2025-07-30T06:57:06Z
llama-b6029-bin-ubuntu-x64.zip

sha256:213bbc688be8dd37088af48380eaf458fef4772d20beb98c05a1337c141d0585

12.5 MB 2025-07-30T06:57:07Z
llama-b6029-bin-win-cpu-arm64.zip

sha256:81df2551e47bc89f8628def2f1758a884eddfa259b35e7ba82d5fe2d43d02394

10.9 MB 2025-07-30T06:57:08Z
llama-b6029-bin-win-cpu-x64.zip

sha256:390cc295422aaadac1d41aa24c59d5a70c4052500bbe70b5c82ce1ccfc0a32af

13.6 MB 2025-07-30T06:57:09Z
llama-b6029-bin-win-cuda-12.4-x64.zip

sha256:14ca96c2fbc73e2e1a8e46dda40f265b881da0795b43e79422a769347fdf05a1

129 MB 2025-07-30T06:57:10Z
llama-b6029-bin-win-hip-radeon-x64.zip

sha256:83b9c0d2d93b9e44b8d45c378782d39ac8d4bbb847aef44f9516b4726b72af54

298 MB 2025-07-30T06:57:17Z
llama-b6029-bin-win-opencl-adreno-arm64.zip

sha256:cae13a54666257d4852b5dade831125012107be0fa5f5b614043aec48c96635d

11.2 MB 2025-07-30T06:57:30Z
Source code (zip)

2025-07-30T05:25:05Z
Source code (tar.gz)

2025-07-30T05:25:05Z

14 Jul 10:13

github-actions

b5891

0d92267

b5891

llama : add jinja template for rwkv-world (#14665)

* llama : add jinja template for rwkv-world

Signed-off-by: Molly Sophia <[email protected]>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

11 Jul 15:06

github-actions

b5871

aaa088d

b5871

readme : add hot PRs (#14636)

* readme : add hot PRs

* cont

* readme : update title

* readme : hot PRs links

* cont

Assets 15

06 Jul 11:50

github-actions

b5835

6491d6e

b5835

vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485)

Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260

Co-authored-by: Rémy Oudompheng <[email protected]>

Assets 15

03 Jun 09:07

github-actions

b5581

71e74a3

b5581

opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

Assets 18

19 May 08:46

github-actions

b5416

33d7aed

b5416

CANN: Support MOE Model MUL_MAT_ID (#13042)

Signed-off-by: noemotiovon <[email protected]>

Assets 20

20 Apr 05:37

github-actions

b5158

0013715

b5158

Disable CI cross-compile builds (#13022)

Assets 26

07 Apr 10:07

github-actions

b5061

916c83b

b5061

musa: fix compilation warnings in mp_22/31 (#12780)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 26

26 Mar 05:23

github-actions

b4959

53af4db

b4959

convert: fix Mistral3/Gemma3 model hparams init (#12571)

* Fix Mistral3/Gemma3 model hparams init

* set positional args correctly

* use existing hparams if passed

Assets 26

18 Mar 11:32

github-actions

b4913

35cae5b

b4913

SYCL: using graphs is configurable by environment variable and compil…

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: l3utterfly/llama.cpp

b6029

Uh oh!

b5891

Uh oh!

b5871

Uh oh!

b5835

Uh oh!

b5581

Uh oh!

b5416

Uh oh!

b5158

Uh oh!

b5061

Uh oh!

b4959

Uh oh!

b4913

Uh oh!