-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
No response
Model Input Dumps
No response
🐛 Describe the bug
In the current implementation of MambaCacheManager._assign_seq_id_to_cache_index
, if cur_id
is not amongst the finished requests, it will try to pop a free_cache_index
.
- However, it seems there might be an edge case where the
_assign_seq_id_to_cache_index
tries to aggressively pop free indices before_release_finished_requests
has a change to return them
We have some private experiments involving mamba that we reuse the above MambaCacheManager
implementation, but we have observed errors like below
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/jamba.py", line 441, in forward
) = self.mamba_cache.current_run_tensors(input_ids, attn_metadata,
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 54, in current_run_tensors
state_indices = self._prepare_current_run_mamba_cache(
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 144, in _prepare_current_run_mamba_cache
return [
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 145, in <listcomp>
self._assign_seq_id_to_cache_index(req_id, seq_id,
File "/net/storage149/mnt/md0/nmg/miniconda3/envs/vllm-mamba/lib/python3.10/site-packages/vllm/model_executor/models/mamba_cache.py", line 119, in _assign_seq_id_to_cache_index
destination_index = self.free_cache_indices.pop()
IndexError: pop from empty list
which suggests the issue being diagnosed above.
We have made sure that we initialize MambaCacheManager
will have max_batch_size
equal to scheduler_config.max_num_seqs
, which we have set it 10 times as large as our batch_size. We use around 8 scheduler steps.
Question: But how can we be sure that the cache occupancy will never exceed max_batch_size
?
CC: @nelsonspbr
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working