Command line switch to use F16 for memory_k and memory_v (refactor of #154) #294

Green-Sky · 2023-03-19T13:55:56Z

made the changes requested by @ggerganov in #154 . fixes #146

With this change, you can half the llama_model_load: memory_size = 512.00 MB -> memory_size = 256.00 MB
(ctx512 7B q4_0)

A non empirical comparison does not seem to degrade the quality of the prediction. but that might not mean anything. (waiting on #270)

….22.0 Bump uvicorn from 0.21.1 to 0.22.0

Green-Sky mentioned this pull request Mar 19, 2023

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Closed

Green-Sky force-pushed the f16_memory_cli branch 2 times, most recently from 7b92973 to 82fcf96 Compare March 19, 2023 16:51

ty-everett and others added 2 commits March 19, 2023 18:22

Use F16 for memory_k and memory_v

640b560

add command line switch to use f16 instead of f32 for memory k+v

31edd6f

Green-Sky force-pushed the f16_memory_cli branch from 82fcf96 to 31edd6f Compare March 19, 2023 17:22

ggerganov approved these changes Mar 19, 2023

View reviewed changes

ggerganov merged commit 0b366e7 into ggml-org:master Mar 19, 2023

Green-Sky deleted the f16_memory_cli branch March 22, 2023 12:15

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023

Merge pull request ggml-org#294 from abetlen/dependabot/pip/uvicorn-0…

2135497

….22.0 Bump uvicorn from 0.21.1 to 0.22.0

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Command line switch to use F16 for memory_k and memory_v (refactor of #154) #294

Command line switch to use F16 for memory_k and memory_v (refactor of #154) #294

Uh oh!

Green-Sky commented Mar 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

Command line switch to use F16 for memory_k and memory_v (refactor of #154) #294

Command line switch to use F16 for memory_k and memory_v (refactor of #154) #294

Uh oh!

Conversation

Green-Sky commented Mar 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Green-Sky commented Mar 19, 2023 •

edited

Loading