upstream release branch with master changes #48

Davidqian123 · 2025-03-05T18:55:26Z

No description provided.

* Add initial ggml cmake package * Add build numbers to ggml find-package * Expand variables with GGML_ prefix * Guard against adding to cache variable twice * Add git to msys2 workflow * Handle ggml-cpu-* variants * Link ggml/ggml-base libraries to their targets * Replace main-cmake-pkg with simple-cmake-pkg * Interface features require c_std_90 * Fix typo * Removed unnecessary bracket from status message * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <[email protected]> * Update examples/simple-cmake-pkg/README.md Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

…gml-org#11422) Signed-off-by: rare-magma <[email protected]>

* metal : use residency sets ggml-ci * metal : restore commandBufferWithUnretainedReferences calls [no ci] * metal : release descriptors ggml-ci * metal : check env GGML_METAL_NO_RESIDENCY ggml-ci * metal : fix build + clean-up ggml-ci

* ci : do not fail-fast for docker * build arm64/amd64 separatedly * fix pip * no fast fail * vulkan: try jammy

…-org#11441) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).

* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30% * llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings * Update src/llama-vocab.cpp --------- Co-authored-by: lexasub <[email protected]> Co-authored-by: Diego Devesa <[email protected]>

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

https://huggingface.co/docs/hub/en/ollama Signed-off-by: Eric Curtin <[email protected]>

The HTTP client in llama-run only prints an error in case the download of a resource failed. If the model name in the CLI parameter list is missing, this causes the application to crash. In order to prevent this, a check for the required model parameter has been added and errors for resource downloads get propagated to the caller. Signed-off-by: Michael Engel <[email protected]>

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during ggml-org#5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases

Signed-off-by: rare-magma <[email protected]>

As pulling protocols to llama-run Signed-off-by: Eric Curtin <[email protected]>

…le instantation bug (ggml-org#11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

* ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble

…-org#11473) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True

ggml-org#11466)

Signed-off-by: Molly Sophia <[email protected]>

This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.

…(ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <[email protected]>

* Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <[email protected]>

People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <[email protected]>

…gml-org#11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging

* server : update auto gen files comments This commit updates the 'auto generated files' comments in server.cpp and removes `deps.sh` from the comment. The motivation for this change is that `deps.sh` was removed in Commit 91c36c2 ("server : (web ui) Various improvements, now use vite as bundler (ggml-org#10599)"). * squash! server : update auto gen files comments [no ci] Move comments about file generation to README.md. * squash! server : update auto gen files comments [no ci] Remove the comments in server.cpp that mention that information can be found in the README.md file.

…-org#11360) * vulkan: initial support for IQ3_S * vulkan: initial support for IQ3_XXS * vulkan: initial support for IQ2_XXS * vulkan: initial support for IQ2_XS * vulkan: optimize Q3_K by removing branches * vulkan: implement dequantize variants for coopmat2 * vulkan: initial support for IQ2_S * vulkan: vertically realign code * port failing dequant callbacks from mul_mm * Fix array length mismatches * vulkan: avoid using workgroup size before it is referenced * tests: increase timeout for Vulkan llvmpipe backend --------- Co-authored-by: Jeff Bolz <[email protected]>

…ja functionality (ggml-org#11489) * add /apply-template endpoint to server * remove unnecessary line * add /apply-template documentation * return only "prompt" field in /apply-template * use suggested idea instead of my overly verbose way

Sync Nexa's llama.cpp Fork with Upstream Updates

bandoti and others added 30 commits January 26, 2025 12:07

docker: add missing vulkan library to base layer and update to 24.04 (g…

6f53d8a

…gml-org#11422) Signed-off-by: rare-magma <[email protected]>

metal : use residency sets (ggml-org#11427)

178a7eb

* metal : use residency sets ggml-ci * metal : restore commandBufferWithUnretainedReferences calls [no ci] * metal : release descriptors ggml-ci * metal : check env GGML_METAL_NO_RESIDENCY ggml-ci * metal : fix build + clean-up ggml-ci

docker : fix ARM build and Vulkan build (ggml-org#11434)

caf773f

* ci : do not fail-fast for docker * build arm64/amd64 separatedly * fix pip * no fast fail * vulkan: try jammy

metal: Handle null returned from MTLCreateSystemDefaultDevice() (ggml…

acd38ef

…-org#11441) This fixes segmentation fault error when running tests when no metal devices are available (for example, when not linked with Core Graphics framework or otherwise).

llama: refactor llama_decode_impl (ggml-org#11381)

df984e0

AMD: parse the architecture as supplied by gcnArchName (ggml-org#11244)

d6d24cd

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

Add new hf protocol for ollama (ggml-org#11449)

a4417dd

https://huggingface.co/docs/hub/en/ollama Signed-off-by: Eric Curtin <[email protected]>

docker: add perplexity and bench commands to full image (ggml-org#11438)

f643120

Signed-off-by: rare-magma <[email protected]>

cmake : don't fail on GGML_CPU=OFF (ggml-org#11457)

4bf3119

docker: allow installing pip packages system-wide (ggml-org#11437)

d7d1ecc

Signed-off-by: rare-magma <[email protected]>

Add github protocol pulling and http:// (ggml-org#11465)

7fee288

As pulling protocols to llama-run Signed-off-by: Eric Curtin <[email protected]>

HIP: Only call rocblas_initialize on rocblas versions with the multip…

cae9fb4

…le instantation bug (ggml-org#11080) This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

HIP: Supress transformation warning in softmax.cu

be5ef79

loops with bounds not known at compile time can not be unrolled. when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

ci : fix build CPU arm64 (ggml-org#11472)

d0c0804

* ci : fix build CPU arm64 * failed, trying ubuntu 22 * vulkan: ubuntu 24 * vulkan : jammy --> noble

server : Fixed wrong function name in llamacpp server unit test (ggml…

cf8cc85

…-org#11473) The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True

cmake: add hints for locating ggml on Windows using Llama find-package (

794fe23

ggml-org#11466)

llama: fix missing k_cache store for rwkv6qwen2 (ggml-org#11445)

325afb3

Signed-off-by: Molly Sophia <[email protected]>

embedding : enable --no-warmup option (ggml-org#11475)

b636228

This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. …

d2e518e

…(ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <[email protected]>

sync : ggml

8158577

Parse https://ollama.com/library/ syntax (ggml-org#11480)

f0d4b29

People search for ollama models using the web ui, this change allows one to copy the url from the browser and for it to be compatible with llama-run. Signed-off-by: Eric Curtin <[email protected]>

vulkan: Catch pipeline creation failure and print an error message (g…

2711d02

…gml-org#11436) * vulkan: Catch pipeline creation failure and print an error message Also, fix some warnings from my on-demand compile change. * vulkan: fix pipeline creation logging

xsxszab and others added 13 commits March 3, 2025 14:16

Updates android example.

12d7d06

Updating examples.

e022286

Partially updated examples.

67d62a5

Nexa models currently not working, disable them for now.

5f72744

update

27e84d8

Fixes omni vlm.

4f75568

Updates README.

5f306d1

Removed unnecessary kompute dependency.

22abab5

Adds update date to readme.

d6f8965

Merges master into current branch.

0ee75e7

Simplifies code base.

4c7c557

Adds todos.

f6c655b

Merge pull request #47 from NexaAI/yifei/sync_upstream

41aa79a

Sync Nexa's llama.cpp Fork with Upstream Updates

Davidqian123 merged commit 3cbe01d into release Mar 5, 2025
2 checks passed

github-actions bot added documentation Improvements or additions to documentation Kompute Apple Metal SYCL Nvidia GPU Vulkan testing build examples devops python script android server ggml nix labels Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upstream release branch with master changes #48

upstream release branch with master changes #48

Uh oh!

Davidqian123 commented Mar 5, 2025

Uh oh!

Uh oh!

Uh oh!

upstream release branch with master changes #48

upstream release branch with master changes #48

Uh oh!

Conversation

Davidqian123 commented Mar 5, 2025

Uh oh!

Uh oh!

Uh oh!