[Feature]: NVIDIA Triton GenAI Perf Benchmark

### 🚀 The feature, motivation and pitch

The GenAI perf toolkit from NVIDIA can be used as an alternative benchmark tools for vLLM. While we already have benchmark scripts and framework in `benchmarks` directory, we should test out different load generators to compare the performance and accuracy of the benchmark clients. 

In this issues, I described some tasks that we need help with to try out the new benchmark harness:
* Compare the output of the genai perf with the `benchmark_serving`, on the coverage of the result metrics and the accuracy. 
* Vary the workloads ShareGPT/Sonnet/synthetics
* Implement it as an alternative harness through the script. 

Happy to elaborate as well. 

https://pypi.org/project/genai-perf/ 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: NVIDIA Triton GenAI Perf Benchmark #10377

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: NVIDIA Triton GenAI Perf Benchmark #10377

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions