[Usage]: vllm启动qwen2.5vl-7b以后 为什么显存使用越来越多

### Your current environment

使用vllm0.7.3启动qwen2.5vl-7b模型，模型启动命令是：nohup env CUDA_VISIBLE_DEVICES=4, vllm serve /Qwen/Qwen2___5-VL-7B-Instruct/ --trust-remote-code --served-model-name qwen_model --gpu-memory-utilization 0.9 --tensor-parallel-size 4 --port 8000 &>qwen.log &
使用显卡是4张nvidia 4090 最开始启动的时候显存占用是每张卡约8G左右，运行时间越长，显存占用越多，一晚上的显存占用增加到约12G左右。

![Image](https://github.com/user-attachments/assets/b4083856-9935-4362-9a64-58436ac11107) 

![Image](https://github.com/user-attachments/assets/23c7b6ba-4db7-4c16-b195-3973ea78771f) 请问为什么显存会越来越多呀？应该怎么解决这个问题呀？

### How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
我应该怎么使用vllm才不会造成这种显存越来越多的现象呢？

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: vllm启动qwen2.5vl-7b以后为什么显存使用越来越多 #19828

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: vllm启动qwen2.5vl-7b以后 为什么显存使用越来越多 #19828

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Usage]: vllm启动qwen2.5vl-7b以后为什么显存使用越来越多 #19828