You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/disaggregated/slurm/benchmark/README.md
+42-16Lines changed: 42 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,13 +4,15 @@ This directory contains scripts to run disaggregated inference benchmarks using
4
4
5
5
## Overview
6
6
7
-
The benchmarking process is orchestrated through a set of shell scripts and a Python script that work together:
7
+
The benchmarking process is orchestrated through a set of shell scripts and Python scripts that work together:
8
8
9
9
1.`submit.sh`: The main entry point for submitting benchmark jobs to SLURM. It runs a parameter sweep by calling `sbatch` with different configurations.
10
-
2.`disaggr_torch.slurm`: The SLURM script that sets up and runs a single benchmark experiment. It launches a container, generates a configuration file, starts the server and workers, and runs the benchmark client.
11
-
3.`gen_yaml.py`: A Python script that generates the `config.yaml` file needed by `trtllm-serve`. It determines the server and worker configuration based on SLURM environment variables and script arguments.
12
-
4.`start_worker.sh`: A shell script responsible for starting a `trtllm-serve disaggregated_mpi_worker` on each allocated machine.
13
-
5.`run_benchmark.sh`: A shell script that waits for the server to be healthy and then runs the actual benchmark client (`run_benchmark.py`, not included in this directory).
10
+
2.`disaggr_torch.slurm`: The SLURM script that sets up and runs a single benchmark experiment. It launches a container, generates configuration files, starts the server and workers, and runs the benchmark client.
11
+
3.`gen_worker_config.py`: A Python script that generates the worker configuration YAML file needed by `trtllm-serve`. It determines the worker configuration based on SLURM environment variables and script arguments.
12
+
4.`gen_server_config.py`: A Python script that generates the server configuration YAML file needed by `trtllm-serve`. It determines the server configuration based on the number of context and generation servers.
13
+
5.`start_worker.sh`: A shell script responsible for starting disaggregated workers using `trtllm-serve` on each allocated machine.
14
+
6.`start_server.sh`: A shell script responsible for starting disaggregated server using `trtllm-serve` on each allocated machine.
15
+
7.`run_benchmark.sh`: A shell script that waits for the server to be healthy and then runs the actual benchmark client (`run_benchmark.py`, not included in this directory).
14
16
15
17
## File Descriptions
16
18
@@ -58,28 +60,52 @@ It takes the following arguments in order:
58
60
24.`model_dir`: Model directory path.
59
61
25.`trtllm_repo`: TensorRT-LLM repository path.
60
62
61
-
### `gen_yaml.py`
63
+
### `gen_worker_config.py`
62
64
63
-
This Python script generates the `config.yaml`file that configures the `trtllm-serve`application. It reads SLURM environment variables (`SLURM_JOB_NODELIST`, `SLURM_TASKS_PER_NODE`) to distribute workers across nodes.
65
+
This Python script generates the worker configuration YAML file that configures the `trtllm-serve`workers. It creates separate configurations for context and generation workers with different tensor parallelism, batch sizes, and other parameters.
64
66
65
67
**Usage:**
66
68
67
-
The script is called from within `disaggr_torch.slurm`. It takes numerous arguments to define the model, parallelism, and server configurations.
69
+
The script is called from within `disaggr_torch.slurm`. It takes numerous arguments to define the model, parallelism, and worker configurations for both context and generation phases.
70
+
71
+
### `gen_server_config.py`
72
+
73
+
This Python script generates the server configuration YAML file that configures the `trtllm-serve` disaggregated server. It reads hostname information from the work directory and creates a configuration that specifies the URLs for context and generation servers.
74
+
75
+
**Usage:**
76
+
77
+
The script is called from within `start_server.sh`. It takes arguments for the number of context and generation servers and the work directory.
68
78
69
79
### `start_worker.sh`
70
80
71
81
This script starts a `trtllm-serve disaggregated_mpi_worker`. It is launched by `srun` from the `disaggr_torch.slurm` script on all allocated nodes.
72
82
73
83
**Arguments:**
74
84
75
-
1.`config_file`: Path to the `config.yaml` file.
76
-
2.`enable_pdl`: `true` or `false`.
77
-
3.`ctx_gpus`: Number of GPUs used for the context phase.
78
-
4.`work_dir`: (Optional) Directory to store nsys profiling output.
85
+
1.`worker_type`: Either "CTX" or "GEN" to specify the worker type.
86
+
2.`worker_index`: Index of the worker instance.
87
+
3.`model_dir`: Path to the model directory.
88
+
4.`worker_port`: Port for the worker to listen on.
89
+
5.`benchmark_mode`: Benchmark mode setting.
90
+
6.`concurrency`: Concurrency level.
91
+
7.`enable_pdl`: `true` or `false`.
92
+
8.`work_dir`: Work directory for logs and configuration.
93
+
9.`nsys_on`: Whether to enable nsys profiling.
94
+
95
+
### `start_server.sh`
96
+
97
+
This script starts the `trtllm-serve disaggregated` server. It first generates the server configuration using `gen_server_config.py`, then starts the server process.
98
+
99
+
**Arguments:**
100
+
101
+
1.`num_ctx_servers`: Number of context servers.
102
+
2.`num_gen_servers`: Number of generation servers.
103
+
3.`work_dir`: Work directory for logs and configuration.
104
+
4.`script_dir`: Directory containing the scripts.
79
105
80
106
### `run_benchmark.sh`
81
107
82
-
This script orchestrates the execution of the benchmark client. It waits for the `config.yaml` to be created and for the server's `/health` endpoint to respond, then it runs the benchmark.
108
+
This script orchestrates the execution of the benchmark client. It waits for the configuration files to be created and for the server's `/health` endpoint to respond, then it runs the benchmark.
83
109
84
110
**Arguments:**
85
111
@@ -97,9 +123,9 @@ This script orchestrates the execution of the benchmark client. It waits for the
97
123
2. The user runs `./submit.sh`.
98
124
3.`submit.sh` submits one or more jobs to SLURM by calling `sbatch disaggr_torch.slurm` with different parameters.
99
125
4. For each job, SLURM allocates resources and runs `disaggr_torch.slurm`.
100
-
5.`disaggr_torch.slurm` runs `gen_yaml.py` to create a `config.yaml`.
101
-
6.`disaggr_torch.slurm` uses `srun` to launch `start_worker.sh` on all nodes, starting the MPI workers.
102
-
7.`disaggr_torch.slurm` starts the main `trtllm-serve` process.
126
+
5.`disaggr_torch.slurm` runs `gen_worker_config.py` to create worker configuration files.
127
+
6.`disaggr_torch.slurm` uses `srun` to launch `start_worker.sh` on all nodes, starting the MPI workers for both context and generation phases.
128
+
7.`disaggr_torch.slurm` starts the main `trtllm-serve` process using `start_server.sh`, which generates the server configuration using `gen_server_config.py`.
103
129
8.`disaggr_torch.slurm` runs `run_benchmark.sh` which waits for the server to be ready.
104
130
9.`run_benchmark.sh` executes the benchmark for each concurrency level specified.
105
131
10. After the benchmark, `run_benchmark.sh` and `disaggr_torch.slurm` attempt to kill the server and worker processes.
f"Waiting for hostnames to be found in {hostnames_folder}, current length: {len(hostnames)}, expected length: {args.num_ctx_servers+args.num_gen_servers}"
48
+
)
49
+
print(f"All hostnames found in {hostnames_folder}")
50
+
51
+
# get the ctx and gen hostnames from the hostnames file
0 commit comments