From 8404fbeef1e5af4aef67a85da2189e6102803efe Mon Sep 17 00:00:00 2001 From: CharlesCNorton <135471798+CharlesCNorton@users.noreply.github.com> Date: Fri, 10 Jan 2025 19:41:41 -0500 Subject: [PATCH] Update post_training readme.md This commit fixes the redundancy in the sentence: "This is accomplished by using utilizing Fully Sharded Data Parallel (FSDP) and Tensor Parallelism." to simply read "by utilizing Fully Sharded Data Parallel (FSDP) and Tensor Parallelism." --- cosmos1/models/diffusion/nemo/post_training/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cosmos1/models/diffusion/nemo/post_training/README.md b/cosmos1/models/diffusion/nemo/post_training/README.md index ba8c3ded..a0d351a3 100644 --- a/cosmos1/models/diffusion/nemo/post_training/README.md +++ b/cosmos1/models/diffusion/nemo/post_training/README.md @@ -131,7 +131,7 @@ Executing the [data preprocessing script](./prepare_dataset.py) generates the fo ### 3. Post-train the Model -The third step is to post-train the model. This step uses NeMo Framework's data and model parallelism capabilities to train the model on the post-training samples. This is accomplished by using utilizing Fully Sharded Data Parallel (FSDP) and Tensor Parallelism. +The third step is to post-train the model. This step uses NeMo Framework's data and model parallelism capabilities to train the model on the post-training samples. This is accomplished by utilizing Fully Sharded Data Parallel (FSDP) and Tensor Parallelism. - **FSDP**: Distributes model parameters, optimizer states, and activations across all GPUs - **Tensor Parallelism**: Spreads the parameter tensor of individual layers across GPUs.