Skip to content

Commit 78933da

Browse files
committed
[resouce manager] Fix free memory fraction calculation
Respect fraction specified, or else we will disregard the memory taken for storing the model and get out-of-memory (OOM) for allocating too much blocks. Signed-off-by: eopXD <[email protected]>
1 parent 7c686ba commit 78933da

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

tensorrt_llm/_torch/pyexecutor/resource_manager.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -814,7 +814,14 @@ def calculate_max_num_blocks_from_cpp(
814814
logger.debug(f"window_size_to_layers: {window_size_to_layers}")
815815

816816
free_mem, total_mem = torch.cuda.mem_get_info()
817-
primary_pool_memory_bytes = free_mem
817+
free_mem_fraction = (kv_cache_config.free_gpu_memory_fraction
818+
if kv_cache_config.free_gpu_memory_fraction
819+
is not None else 0.9)
820+
assert free_mem_fraction < 1.0, (
821+
f"Invalid freeMemFraction: {free_mem_fraction} must be < 1.0")
822+
logger.debug(f"free_mem_fraction: {free_mem_fraction}")
823+
824+
primary_pool_memory_bytes = int(free_mem * free_mem_fraction)
818825
secondary_pool_memory_bytes = 0
819826
logger.debug(
820827
f"primary_pool_memory_bytes is set to {primary_pool_memory_bytes/1024**3}GB, \n"

0 commit comments

Comments
 (0)