About
I am a PhD student at Gaoling School of Artificial Intelligence, Renmin University of China (RUC), working in the RUCBM lab and advised by Prof. Yankai Lin. I am also conducting research at Natural Language Processing Lab at Tsinghua University(THUNLP), supervised by Prof. Ning Ding. Currently, I am interning at ByteDance's Seed Foundation Model Team.
My research focuses on Large Language Model (LLM) and Reinforcement Learning (RL). Specifically, I am interested in:
- LLM Alignment (e.g., RLHF, Multi-Objective Optimization)
- Reasoning & Generation (e.g., Causal Inference, Attention Mechanisms)
- Agentic Reinforcement Learning (e.g., Multi-Turn Interaction, Experience Learning)
Selected Publications
View All →News
Our work "Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification" has been accepted to ACL 2026 🎉
Our work "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" has been accepted to ICLR 2026 🎉
Our work "Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning" has been accepted to NeurIPS 2025 (main) 🎉
Our work "Uncertainty and Influence-Aware Reward Model Refinement for Reinforcement Learning from Human Feedback" has been accepted to ICLR 2025 🎉
Our work "Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment" has been accepted to EMNLP 2024 (main) 🎉
Started my PhD at Gaoling School of Artificial Intelligence, Renmin University of China (RUC)
