Skip to content

Great work! Suggesting a similar algorithm CPGD for citation #1

@kkkaiaiai

Description

@kkkaiaiai

Hi, MiniMax team,

Congratulations on your great work! We have been following your recently published results with great interest — it is an exciting and impactful contribution to the field of large reasoning models.

We would like to bring to your attention a related paper from our team, which shares similar concepts and ideas with the CISPO approach you proposed: CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models (https://arxiv.org/abs/2505.12504).

In our work, specifically in Section 6.1 “Importance Sampling” of the Discussion, we introduced a stop-gradient version of importance sampling ratio into the policy gradient loss and incorporated a clipping mechanism into the policy gradient loss, which is conceptually aligned with the core ideas of CISPO. Our code is also open-sourced at: https://github.com/ModalMinds/MM-EUREKA.

Given the conceptual overlap and complementary insights, we believe it may be of interest and relevance to your work. If you find it appropriate, we would greatly appreciate it if you could consider citing our paper in a future revision or publication.

We look forward to seeing more insightful work from your team!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions