Great work! Suggesting a similar algorithm CPGD for citation

**Hi, MiniMax team,** 

Congratulations on your great work! We have been following your recently published results with great interest — it is an exciting and impactful contribution to the field of large reasoning models.

We would like to bring to your attention a related paper from our team, which shares similar concepts and ideas with the CISPO approach you proposed: ***CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models*** (https://arxiv.org/abs/2505.12504).

In our work, specifically in Section 6.1 “Importance Sampling” of the Discussion, we introduced a stop-gradient version of importance sampling ratio into the policy gradient loss and incorporated a clipping mechanism into the policy gradient loss, which is conceptually aligned with the core ideas of CISPO. Our code is also open-sourced at: https://github.com/ModalMinds/MM-EUREKA.

Given the conceptual overlap and complementary insights, we believe it may be of interest and relevance to your work. If you find it appropriate, we would greatly appreciate it if you could consider citing our paper in a future revision or publication.

We look forward to seeing more insightful work from your team!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Great work! Suggesting a similar algorithm CPGD for citation #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Great work! Suggesting a similar algorithm CPGD for citation #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions