Xiaoying Zhang, Hao Sun, Yipeng Zhang, Kaituo Feng, Chaochao Lu, Chao Yang, Helen Meng: Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback. CoRR abs/2506.03106 (2025)