You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* metal : fix minor string leaks (ggml/1004)
* cmake : make it possible linking ggml as external lib (ggml/1003)
* sync : ggml
* CANN: adjust backend registry refactor. (ggml-org#10158)
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
* metal : move dequantize templates to beginning of MSL source (#0)
* metal : simplify f16 and f32 dequant kernels (#0)
* cuda : clear error after changing peer access (ggml-org#10153)
* fix build break on arm64 linux (ggml-org#10166)
This fixes the build break from the recent changes
to move the CPU backend to separate files
ggml-org#10144
* server : clarify /slots endpoint, add is_processing (ggml-org#10162)
* server : clarify /slots endpoint, add is_processing
* fix tests
* ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (ggml-org#10167)
* ggml : fix gelu tables initialization (ggml-org#10172)
* Q6_K AVX improvements (ggml-org#10118)
* q6_k instruction reordering attempt
* better subtract method
* should be theoretically faster
small improvement with shuffle lut, likely because all loads are already done at that stage
* optimize bit fiddling
* handle -32 offset separately. bsums exists for a reason!
* use shift
* Update ggml-quants.c
* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
* ggml : fix arch check in bf16_to_fp32 (ggml-org#10164)
* llama : add <|tool_call|> formatting to Granite template (ggml-org#10177)
Branch: GraniteToolCallTemplate
Signed-off-by: Gabe Goodhart <[email protected]>
* metal : add quantized FA support (ggml-org#10149)
* metal : add quantized FA (vec) support
ggml-ci
* metal : add quantized FA (non-vec) support
* metal : fix support check
ggml-ci
* metal : clean-up
* metal : clean-up (cont)
* metal : fix shared memory calc + reduce smem + comments
* metal : float-correctness
* metal : minor [no ci]
* ggml : adjust is_first_call init value (ggml-org#10193)
ggml-ci
* metal : fix from ptr buffer name (ggml-org#10189)
* server : remove hack for extra parallel slot (ggml-org#10187)
ggml-ci
* metal : add BF16 support (ggml-org#8439)
* ggml : add initial BF16 support
ggml-ci
* metal : add mul_mat_id BF16 support
ggml-ci
* metal : check for bfloat support on the Metal device
ggml-ci
* metal : better var names [no ci]
* metal : do not build bfloat kernels when not supported
ggml-ci
* metal : try to fix BF16 support check
ggml-ci
* metal : this should correctly check bfloat support
---------
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Plamen Minev <[email protected]>
Co-authored-by: Yuri Khrustalev <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: leo-pony <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Co-authored-by: Eve <[email protected]>
Co-authored-by: Gabe Goodhart <[email protected]>
Copy file name to clipboardExpand all lines: examples/server/README.md
+5-6Lines changed: 5 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -692,7 +692,10 @@ Given a ChatML-formatted json description in `messages`, it returns the predicte
692
692
693
693
### GET `/slots`: Returns the current slots processing state
694
694
695
-
This endpoint can be disabled with `--no-slots`
695
+
> [!WARNING]
696
+
> This endpoint is intended fordebugging and may be modifiedin future versions. For security reasons, we strongly advise against enabling it in production environments.
697
+
698
+
This endpoint is disabled by default and can be enabled with `--slots`
696
699
697
700
If query param `?fail_on_no_slot=1` is set, this endpoint will respond with status code 503 if there is no available slots.
698
701
@@ -709,6 +712,7 @@ Example:
709
712
"grammar": "",
710
713
"id": 0,
711
714
"ignore_eos": false,
715
+
"is_processing": false,
712
716
"logit_bias": [],
713
717
"min_p": 0.05000000074505806,
714
718
"mirostat": 0,
@@ -741,7 +745,6 @@ Example:
741
745
"temperature"
742
746
],
743
747
"seed": 42,
744
-
"state": 1,
745
748
"stop": [
746
749
"\n"
747
750
],
@@ -755,10 +758,6 @@ Example:
755
758
]
756
759
```
757
760
758
-
Possible values for`slot[i].state` are:
759
-
- `0`: SLOT_STATE_IDLE
760
-
- `1`: SLOT_STATE_PROCESSING
761
-
762
761
### GET `/metrics`: Prometheus compatible metrics exporter
763
762
764
763
This endpoint is only accessible if`--metrics` is set.
0 commit comments