About
I am a PhD student at Gaoling School of Artificial Intelligence, Renmin University of China (RUC), advised by Prof. Yankai Lin. I am also conducting research at Natural Language Processing Lab at Tsinghua University(THUNLP), supervised by Prof. Ning Ding.
My research focuses on Large Language Model (LLM) and Reinforcement Learning (RL). Specifically, I am interested in:
- LLM Alignment (e.g., RLHF, Multi-Objective Optimization)
- Reasoning & Generation (e.g., Causal Inference, Attention Mechanisms)
Selected Publications
View All โLess Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification
Yiju Guo, Tianyi Hu, Zexu Sun, Yankai Lin
arXiv:2601.21244
An RLVR framework that boosts sampling success by pruning prompt interference tokens, achieving faster convergence and improved performance.
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
Yiju Guo, Wenkai Yang, Zexu Sun, Ning Ding, Zhiyuan Liu, Yankai Lin
NeurIPS 2025 Conference
A framework to improve LLM reasoning by removing distracting tokens via causal attention distillation and gradient-guided pruning.
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Wenkai Yang, Weijie Liu, Ruobing Xie, Yiju Guo, Lulu Wu, Saiyong Yang, Yankai Lin
ICLR 2026 Conference
An efficient RL method that unifies reasoning and verification by utilizing the last-token probability as a self-rewarding signal.
Uncertainty and influence aware reward model refinement for reinforcement learning from human feedback
Zexu Sun, Yiju Guo, Yankai Lin, Xu Chen, Qi Qi, Xing Tang, Ji-Rong Wen
ICLR 2025 Conference
An uncertainty-aware data augmentation method to refine reward models in RLHF without expensive human annotation.
Controllable preference optimization: Toward controllable multi-objective alignment
Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Zexu Sun, Bowen Sun, Huimin Chen, Ruobing Xie, Jie Zhou, Yankai Lin, others
EMNLP 2024 main conference
A multi-objective alignment method that explicitly controls preference scores to balance helpfulness, honesty, and harmlessness.
News
Our work "Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification" is available on arXiv ๐
Our work "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" has been accepted to ICLR 2026 ๐
Our work "Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning" has been accepted to NeurIPS 2025 (main) ๐
Our work "Uncertainty and Influence-Aware Reward Model Refinement for Reinforcement Learning from Human Feedback" has been accepted to ICLR 2025 ๐
Our work "Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment" has been accepted to EMNLP 2024 (main) ๐
Started my PhD at Gaoling School of Artificial Intelligence, Renmin University of China (RUC)
