𝑾𝒆𝒍𝒄𝒐𝒎𝒆 𝒕𝒐 𝑴𝒊𝒏𝒂𝒎𝒊-𝒔𝒖'𝒔 𝑮𝒊𝒕𝑯𝒖𝒃 𝒉𝒐𝒎𝒆.
𝑮𝒊𝒕𝑯𝒖𝒃: github.com/Minami-su
𝑯𝒖𝒈𝒈𝒊𝒏𝒈𝒇𝒂𝒄𝒆: huggingface.co/Minami-su
𝑾𝒆𝒍𝒄𝒐𝒎𝒆 𝒕𝒐 𝑴𝒊𝒏𝒂𝒎𝒊-𝒔𝒖'𝒔 𝑮𝒊𝒕𝑯𝒖𝒃 𝒉𝒐𝒎𝒆.
𝑮𝒊𝒕𝑯𝒖𝒃: github.com/Minami-su
𝑯𝒖𝒈𝒈𝒊𝒏𝒈𝒇𝒂𝒄𝒆: huggingface.co/Minami-su
Generate multi-round conversation roleplay data based on self-instruct and evol-instruct.
This repository, deepspeed-grpo-qlora-vllm, provides a complete framework for fine-tuning LLMs using Group Relative Policy Optimization (GRPO) on 4-bit quantized models (QLoRA). It utilizes DeepSpe…
Python 12
Forked from tomaarsen/attention_sinks
attention_sinks can use autogptq,and support all model at autogptq,like qwen baichuan,etc
Python 1