Skip to content

Commit c9c88ac

Browse files
committed
Avoid unnecessarily disabling CUDA graphs
As discussed in PR ggml-org#6766, CUDA graphs were being disabled in the presence of long prompts. This fixes the issue by avoiding the consective update counter from incrementing unnecessarily for tokens in which cuda graphs are disabled due to batch size > 1.
1 parent 583fd6b commit c9c88ac

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml-cuda.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2558,7 +2558,7 @@ GGML_CALL static enum ggml_status ggml_backend_cuda_graph_compute(ggml_backend_t
25582558
}
25592559

25602560
// Disable CUDA graphs (from the next token) if the use-case is demanding too many consecutive graph updates.
2561-
if (cuda_graph_update_required) {
2561+
if (use_cuda_graph && cuda_graph_update_required) {
25622562
cuda_ctx->cuda_graph->number_consecutive_updates++;
25632563
} else {
25642564
cuda_ctx->cuda_graph->number_consecutive_updates = 0;

0 commit comments

Comments
 (0)