You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
另外论文中说的:these tokens were clipped out after the first on-policy update, preventing them from contributing to subsequent off-policy gradient updates. 低概率高变化量的token为什么在第一轮更新中被裁剪掉呢?clip之后变为1-\EPSILON,但是后边的优势计算的时候没有涉及到可更新梯度的参数吗?那是不是还是会对整体梯度有影响呢?