Zihan Wang profile photo
Zihan "Zenus" Wang王子涵
Researcher · Northwestern University

I am a PhD researcher at Northwestern University working on Agentic RL with Manling Li. I will spend this summer at NVIDIA, and I previously interned at Microsoft, Yutori, and DeepSeek.

Northwestern-MLL-Lab is seeking research collaborators/interns! More details here. Feel free to drop an email.

News

  • Mar 4, 2026 — Joined the David Ondrej Podcast! We talked about why multi-turn agent RL collapses—and what we can do about it. Also discussed DeepSeek's infra and research culture, AI labs comparison, and agent bottlenecks. [Watch]
  • May 22, 2025 — Joined the Manifold Podcast with Steve Hsu! Dived into robotics, small models, RL, and lessons from DeepSeek. Also shared my work on RAGEN and Chain-of-Experts. [Listen]
  • Apr 25, 2025 — Gave a talk about RAGEN at UIUC NLP Reading Group! [Slides]
  • Jan 27, 2025Introducing RAGEN — the world's first reproduction of DeepSeek-R1(-Zero) methods for training agentic AI models!
  • Sep 20, 2024ESFT has been accepted to the EMNLP 2024 Main Conference!
  • Jul 4, 2024 — Introducing Expert-Specialized Fine-Tuning (ESFT) for efficient and effective LLM customization leveraging Mixture-of-Experts architecture.
  • Jun 2, 2024 — Grateful to be spotlighted by my alma mater RUC for my journey and achievements. [Read blog]
  • Feb 15, 2024 — Excited to join Northwestern as a researcher! Many thanks to my advisor Manling Li!
  • Oct 19, 2023 — Honored to be awarded the Baosteel Outstanding Student Award 2023 as the only undergrad student among science and technology departments in RUC!
  • Jun 7, 2023 — Excited to join UIUC Blender Lab this summer as a student researcher!
  • Dec 12, 2022 — Posted an article introducing ChatGPT on Capital of Statistics. [Link]

Selected Publications

Full list on Google Scholar / Semantic Scholar

RAGEN-2
NewRAGEN-2: Reasoning Collapse in Agentic RL
Zihan Wang*, Chi Gui*, Xing Jin*, Qineng Wang*, Licheng Liu*, Kangrui Wang, Shiqi Chen, Linjie Li, Zhengyuan Yang, Pingyue Zhang, Yiping Lu, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
We discover template collapse in multi-turn agent RL — where models learn input-agnostic reasoning patterns that fool entropy metrics. We propose SNR-Aware Filtering to fix it.
NewVAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025  Featured by Stanford AI Blog
VAGEN trains vision-language agents with explicit world-model reasoning and bi-level reinforcement learning, stabilizing credit assignment in sparse multi-turn environments.
MindCube
NewSpatial Mental Modeling from Limited Views (MindCube)
Baiqiao Yin*, Qineng Wang*, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu†, Li Fei-Fei†, Manling Li†
Outstanding Paper @ NeurIPS 2025 LAW · Best Paper @ ICCV 2025 SP4V · Adopted by Gemini 3 Pro
MindCube curates 21K spatial reasoning questions over 3K scenes. Guiding VLMs to map-then-reason boosts accuracy from 37.8% to 70.7%.
Unary Feedback
NewA Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Licheng Liu*, Zihan Wang*, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li
Preprint 2025
Unary Feedback as Observation (UFO): minimal prompts like "try again" keep single-turn quality while improving multi-turn accuracy by up to 14%.
RAGEN
RAGEN: Training Agents by Reinforcing Reasoning
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Xing Jin, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Open Source Project  Best Poster @ MMLS 2025 · Invited talks @ DeepMind, UIUC NLP Group, GenAI Week 25
RAGEN introduces StarPO (State-Thinking-Actions-Reward Policy Optimization) to train LLM reasoning agents via RL in multi-turn, stochastic environments.
Chain-of-Experts
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Zihan Wang*, Rui Pan*, Jiarui Yao, Róbert Csordás, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li†, Shiwei Liu†
arXiv Preprint
CoE enables sequential communication between MoE experts, achieving 17.6–42% lower memory usage and reducing validation loss on math benchmarks from 1.20 to 1.12.
T*
Re-thinking Temporal Search for Long-Form Video Understanding (T*)
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025  Oral @ ICCV 2025 LongVid-Foundations · Featured by Stanford AI Blog
LongVideoHaystack: 480-hour video temporal search dataset with 15,092 instances. T* boosts GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.
ESFT
Expert-Specialized Fine-Tuning for Sparse Architectural LLMs (ESFT)
Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Yu Wu
EMNLP 2024
By fine-tuning down to 5% of experts per layer in MoE LLMs, near-full performance is achieved with far lower compute cost.
MINT
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji
ICLR 2024
MINT benchmarks LLMs in multi-turn interactions with tools and language feedback, revealing limitations in existing RLHF and SIFT methods.
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI (157 authors including Zihan Wang)
Technical Report 2024
DeepSeek-V2 is a strong MoE model with 23B activated parameters, saving 42.5% training costs and boosting generation throughput by up to 5.76x vs. DeepSeek 67B.

Invited Talks

Awards

Professional Service

Selected Societal Engagements

Misc