Skip to content

Conversation

liuyanyi
Copy link
Contributor

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

CN

我们正在使用swift训练一个具有自定义架构的LLM,现在通过MegatronModelMeta,我们可以使用我们自定义的model_provider,但是还有一部分数据无法通过参数传入provider。为了加入这些参数我们需要同时修改megatron和swift中的argument,因此我们添加了自定义参数的功能。

核心改动如下:

  1. extra_megatron_kwargs:通过在命令行参数中添加这个参数,通过JSON字符串进行传入,该参数会透传到megatron,这样可以直接配置目前swift未覆盖的megatron 参数,例如vpp_size等参数。
  2. extra_args_provider:添加在MegatronModelMeta中,可以扩展megatron的arg,参数默认为None,这个参数扩展了megatron的args,这样才能在provider获取到我们添加的额外参数

ENG

We are training a model with some custom arch modify. With MegatronModelMeta we can use our customized model_provider, but there is still a part of the data that can't be passed into the provider via arguments. in order to add these arguments we need to modify the arguments in megatron and swift at the same time, so we added the ability to customize the arguments.

The core changes are as follows:

  1. extra_megatron_kwargs: by adding this parameter to the command line arguments, and passing it through a JSON string, the parameter will be passed through to the megatron, so that we can directly configure the megatron arguments that are not currently covered by swift, such as vpp_size and so on.
  2. extra_args_provider: added in MegatronModelMeta, can extend the args of megatron, the default parameter is None, this parameter extends the args of megatron, so that we can get the extra parameters we added in the provider.

Translated with DeepL.com (free version)

示例:

  1. 在convert_hf_config中使用extra_megatron_kwargs

c2976d4a-81a0-49b2-a122-0e5ba0a8f98b

  1. 为megatron添加 args

image

  1. 透传现有的megatron参数
megatron sft \
    *****其他参数
    --extra-megatron-kwargs "{\"num_layers_per_virtual_pipeline_stage\":2}"

Experiment results

Paste your experiment result here(if needed).

@Jintao-Huang
Copy link
Collaborator

Hi! That's a really necessary feature!
Please merge the main branch and run the following code to format the code:

pip install pre-commit
pre-commit run --all-files

@liuyanyi
Copy link
Contributor Author

Hi! That's a really necessary feature! Please merge the main branch and run the following code to format the code:

pip install pre-commit
pre-commit run --all-files

done

@Jintao-Huang Jintao-Huang merged commit 5479249 into modelscope:main May 23, 2025
2 checks passed
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request May 23, 2025
…o_padding_ulysses

* commit 'e9475f1a306614b30fc6314cc08eb5b40a3f17aa':
  qwen2_5_vl support video use image_dir (modelscope#4326)
  [megatron] Add extra args and provider support for easily customize megatron (modelscope#4240)
  Update internvl.py, solve the exception when setting customized INPUT_SIZE. (modelscope#4320)
  [grpo] support liger loss (modelscope#3781)
  compat transformer_engine update (modelscope#4317)
  compat transformers==4.52 (modelscope#4308)
  [grpo] support dp in external mode (modelscope#4279)
  fix vllm engine return empty in stream generation (modelscope#4303)
  fix (modelscope#4316)
  update swift image (modelscope#4309)
  update load_args (modelscope#4296)
  fix n > 1 with vLLM V1 Engine (modelscope#4295)
  Reuse existing code
  [grpo] fix num of reward_model > 1  (modelscope#4287)
  modify grpo system
  fix grpo tab
  support grpo web_ui

# Conflicts:
#	swift/trainers/sequence_parallel/ulysses.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants