diff --git a/docs/source/index.md b/docs/source/index.md index a6806900cb3c..fbd20a1f9269 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -100,6 +100,14 @@ features/compatibility_matrix % Details about running vLLM +:::{toctree} +:caption: Training +:maxdepth: 1 + +training/trl.md + +::: + :::{toctree} :caption: Inference and Serving :maxdepth: 1 diff --git a/docs/source/training/trl.md b/docs/source/training/trl.md new file mode 100644 index 000000000000..ebdf593dbde5 --- /dev/null +++ b/docs/source/training/trl.md @@ -0,0 +1,13 @@ +# Transformers Reinforcement Learning + +Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers. + +Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions! + +See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information. + +:::{seealso} +For more information on the `use_vllm` flag you can provide to the configs of these online methods, see: +- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm) +- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm) +:::