News

  • May 30, 2026 — New preprint: BAGEN: Towards Budget-Aware Agents — Do LLM agents know how much budget they will spend?
  • May 13, 2026 — Awarded Northwestern Presidential Fellowship Finalist and Golden Reviewer for ICML.
  • May 3, 2026 — Excited to organize FAGEN Workshop @ ICML 2026! We welcome submissions on failure modes in agentic AI. Check out the CFP for more details.
  • Apr 30, 2026RAGEN-2 was selected as an ICML 2026 Oral.
  • Apr 25, 2026 — Featured in ZPotentials on my academic journey and point of views on Agentic RL: towards better decision-making and world modeling agents. [Read]
  • Mar 4, 2026 — Joined the David Ondrej Podcast! We talked about why multi-turn agent RL collapses—and what we can do about it. Also discussed DeepSeek's infra and research culture, AI labs comparison, and agent bottlenecks. [Watch]
  • Jan 26, 2026 — Four papers [ MindCube, Theory of Space, PersonaMask, Weak2Strong ] were accepted to ICLR 2026.
  • May 22, 2025 — Joined the Manifold Podcast with Steve Hsu! Dived into robotics, small models, RL, and lessons from DeepSeek. Also shared my work on RAGEN and Chain-of-Experts. [Listen]
  • Apr 25, 2025 — Gave a talk about RAGEN at UIUC NLP Reading Group. [Slides]
  • Jan 27, 2025Introducing RAGEN — the world's first reproduction of DeepSeek-R1(-Zero) methods for training agentic AI models.
  • Sep 20, 2024ESFT has been accepted to the EMNLP 2024 Main Conference.
  • Jul 4, 2024 — Introducing Expert-Specialized Fine-Tuning (ESFT) for efficient and effective LLM customization leveraging Mixture-of-Experts architecture.
  • Jun 2, 2024 — Grateful to be spotlighted by my alma mater RUC for my journey and achievements. [Read blog]
  • Feb 15, 2024 — Excited to join Northwestern as a researcher! Many thanks to my advisor Manling Li!
  • Oct 19, 2023 — Honored to be awarded the Baosteel Outstanding Student Award 2023 as the only undergrad student among science and technology departments in RUC!
  • Jun 7, 2023 — Excited to join UIUC Blender Lab this summer as a student researcher!
  • Dec 12, 2022 — Posted an article introducing ChatGPT on Capital of Statistics. [Link]

Selected Publications

Full list on Google Scholar / Semantic Scholar

BAGEN
NewBAGEN: Are LLM Agents Budget-Aware?
Yuxiang Lin*, Zihan "Zenus" Wang*, Mengyang Liu*, Yuxuan Shan*, Longju Bai*, Junyao Zhang, Xing Jin, Boshan Chen, Jinyan Su, Xingyao Wang, Jiaxin Pei, Manling Li
Preprint 2026  Midwest ML Symposium 2026 Spotlight
We define agent budget awareness with progressive interval estimation protocol, evaluate 5 frontier models on 4 environments, and find budget awareness decouples from task performance, and agents fail universally in over-optimism and late failure recognition.
RAGEN-2
NewRAGEN-2: Reasoning Collapse in Agentic RL
Zihan Wang*, Chi Gui*, Xing Jin*, Qineng Wang*, Licheng Liu*, Kangrui Wang, Shiqi Chen, Linjie Li, Zhengyuan Yang, Pingyue Zhang, Yiping Lu, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
ICML 2026 Oral (top 0.7%) · Best Paper @ CVPR 2026 MMRAgI (top 1%)  Huggingface #2 Paper of the Day, Invited Talk @ MIT Media Lab
We discover template collapse in multi-turn agent RL — where models learn input-agnostic reasoning patterns that fool entropy metrics. We propose SNR-Aware Filtering to fix it.
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025  Featured by Stanford AI Blog
VAGEN trains vision-language agents with explicit world-model reasoning and bi-level reinforcement learning, stabilizing credit assignment in sparse multi-turn environments.
MindCube
Spatial Mental Modeling from Limited Views (MindCube)
Qineng Wang*, Baiqiao Yin*, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu†, Li Fei-Fei†, Manling Li†
ICLR 2026 · Outstanding Paper @ NeurIPS 2025 LAW · Best Paper @ ICCV 2025 SP4V · Adopted by Gemini 3 Pro
MindCube curates 21K spatial reasoning questions over 3K scenes. Guiding VLMs to map-then-reason boosts accuracy from 37.8% to 70.7%.
Unary Feedback
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Licheng Liu*, Zihan Wang*, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li
Preprint 2025
Unary Feedback as Observation (UFO): minimal prompts like "try again" keep single-turn quality while improving multi-turn accuracy by up to 14%.
RAGEN
RAGEN: Training Agents by Reinforcing Reasoning
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Xing Jin, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Open Source Project  Best Poster @ MMLS 2025 · Invited talks @ DeepMind, UIUC NLP Group, GenAI Week 25
RAGEN introduces StarPO (State-Thinking-Actions-Reward Policy Optimization) to train LLM reasoning agents via RL in multi-turn, stochastic environments.
Chain-of-Experts
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Zihan Wang*, Rui Pan*, Jiarui Yao, Róbert Csordás, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li†, Shiwei Liu†
arXiv Preprint
CoE enables sequential communication between MoE experts, achieving 17.6–42% lower memory usage and reducing validation loss on math benchmarks from 1.20 to 1.12.
T*
Re-thinking Temporal Search for Long-Form Video Understanding (T*)
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025  Oral @ ICCV 2025 LongVid-Foundations · Featured by Stanford AI Blog
LongVideoHaystack: 480-hour video temporal search dataset with 15,092 instances. T* boosts GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.
ESFT
Expert-Specialized Fine-Tuning for Sparse Architectural LLMs (ESFT)
Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Yu Wu
EMNLP 2024
By fine-tuning down to 5% of experts per layer in MoE LLMs, near-full performance is achieved with far lower compute cost.
MINT
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji
ICLR 2024
MINT benchmarks LLMs in multi-turn interactions with tools and language feedback, revealing limitations in existing RLHF and SIFT methods.
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI (157 authors including Zihan Wang)
Technical Report 2024
DeepSeek-V2 is a strong MoE model with 23B activated parameters, saving 42.5% training costs and boosting generation throughput by up to 5.76x vs. DeepSeek 67B.

Invited Talks

Awards

Professional Service

Selected Societal Engagements

Misc