Zihan Wang

I am a CS PhD student at Northwestern. I am fortunate to be advised by the wonderful Manling Li.
My Chinese name is 王子涵. You can pronounce my name as "Zzz-han Wang".

Email  /  Github  /  Semantic Scholar  /  Zhihu  /  CV

profile photo
News
  • 🗓️ Long-term - Northwestern-MLL-Lab is seeking research collaborators/interns! More details here. If you’d like to work with me, plz drop an email. You may feel free to lead/join projects; we prefer strong coding, rapid learning skills, interdisciplinary expertise (STEM/other). Research experience is welcome, but not a requirement!
  • 🗓️ May 22, 2025 – Thrilled to join the Manifold Podcast with Steve Hsu! We dived into robotics, small models, RL, and lessons from DeepSeek. Also shared my recent work on RAGEN and Chain-of-Experts. [Listen here].
  • 🗓️ Apr 25, 2025 - Excited to give a talk about RAGEN at UIUC NLP Reading Group! [Slides]
  • 🗓️ Jan 27, 2025 - Introducing RAGEN -- the world’s first reproduction of DeepSeek-R1(-Zero) methods for training agentic AI models!
  • 🗓️ Sep 20, 2024 - Glad to announce that ESFT has been accepted to the EMNLP 2024 Main Conference! 🎉 Many thanks to all collaborators!
  • 🗓️ Jul 4, 2024 - Thrilled to introduce our latest project at DeepSeek, Expert-Specialized Fine-Tuning (ESFT) for efficient and effective LLM customization by leveraging the highly specialized Mixture-of-Experts (MoE) architecture! 🤖✨
  • 🗓️ Jun 2, 2024 - Grateful to be spotlighted by my alma mater RUC for my journey and achievements. (read blog)
  • 🗓️ Feb 15, 2024 - Excited to join Northwestern as a PhD student! 🎓 Many thanks to my advisor Manling Li!
  • 🗓️ Oct 19, 2023 - Honored to be awarded the Baosteel Outstanding Student Award 2023 🏅 as the ONLY undergrad student among science and technology departments in RUC! Special thanks to NLPIR lab! 🙏
  • 🗓️ Jun 7, 2023 - Excited to share that I'll be joining UIUC Blender Lab 🔬 this summer as a student researcher!
  • 🗓️ Dec 12, 2022 - I posted an article introducing ChatGPT on Capital of Statistics 💡. Do not miss it if you want to know more about ChatGPT! (link)
Research Interest

The growth of foundation models, while extremely rapid, has heightened the need to address the challenges arising from their expanding scale. My research focuses on foundation models' autonomy (RAGEN, MINT benchmark), efficiency (DeepSeek-V2, Expert-Specialized Tuning, Chain-of-Experts), and long-context understanding (Long Video Haystack&T*, RETA-LLM).

Selected Publications

See full list on Google Scholar or Semantic Scholar (Why I Love Semantic Scholar, and You Might Too)

×

Semantic Scholar uses AI-powered tools to summarize papers, highlight key phrases, and rank research by influence. This helps you find important studies faster. Its Semantic Reader helps you understand papers with skimming highlights and citation cards . You can also see how papers connect with citation graphs. While Google Scholar is great for broad searches, Semantic Scholar is smarter for finding high-quality and impactful research!

[New] VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Hanyang Chen, Chi Wan, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025
[Project Page] [Paper] [Docs] [Code] [Blog]

VAGEN trains vision-language agents with explicit world-model reasoning and bi-level reinforcement learning, stabilizing credit assignment in sparse multi-turn environments while improving success on control, navigation, and manipulation benchmarks.


[New] Spatial Mental Modeling from Limited Views
Baiqiao Yin*, Qineng Wang*, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei
Oral@ICCV 2025 SP4V, The Best of ICCV featured by Voxel51
[Project Page] [Paper] [Code] [Dataset]

MindCube curates 21K spatial reasoning questions over 3K scenes and shows that guiding VLMs to map-then-reason boosts accuracy from 37.8% to 70.7%, highlighting cognitive mapping and reinforcement learning as keys to spatial mental modeling.


[New] A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li
Preprint 2025
[Project Page] [Paper] [Code]

Unary Feedback as Observation (UFO) shows that minimal prompts like “try again” keep single-turn quality while improving multi-turn accuracy by up to 14%, delivering a plug-and-play RL recipe for reflective reasoning agents.


[New] RAGEN: Training Agents by Reinforcing Reasoning
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Xing Jin, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Open Source Project
[Homepage] [X Post] [Paper] [Code] [Poster](Best Poster @ MMLS 2025)

We introduce RAGEN built upon the general multi-turn RL framework called State-Thinking-Actions-Reward Policy Optimization (StarPO) to train LLM reasoning agents via RL in multi-turn, stochastic environments. We observe how and why models would collapse in multi-turn RL, and show several limitations of agent reasoning under current RL paradigms.

[New] Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Zihan Wang*, Rui Pan*, Jiarui Yao, Róbert Csordás, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li†, Shiwei Liu†
arXiv Preprint
[Paper] [X Post] [Blog] [Code]

We propose Chain-of-Experts (CoE), enabling sequential communication between MoE experts by processing tokens through multiple intra-layer iterations. CoE achieves 17.6–42% lower memory usage and reduces validation loss on Math benchmarks from 1.20 to 1.12 under comparable compute.

[New] Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025, Oral@ICCV 2025 LongVid-Foundations
[Project Page] [X Post] [Dataset] [Paper] [Code] [Demo] [Poster]

We introduce LongVideoHaystack, a 480-hour video temporal search dataset with 15,092 human-annotated instances, where SOTA scores 2.1% Temporal F1. Our temporal search framework T* boosts GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.

[Highlight] Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Yu Wu
EMNLP 2024
[Paper] [Code]

We harness the Specialized Power of Experts in MoE LLMs through ESFT. By fine-tuning Down to 5% Experts in a layer, near-full performance can be achieved.


[Highlight] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang*, Zihan Wang*, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji
ICLR 2024
[Paper] [Project Page] [Code]

We introduce MINT, a benchmark for evaluating LLMs in Multi-turn Interactions with tools and language feedback. MINT reveals several limitations in existing RLHF and SIFT methods on multi-turn interaction.


[Highlight] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI (157 authors including Zihan Wang)
[Paper] [Code]

DeepSeek-V2 is a strong MoE model with 23B activated parameters. It achieves stronger performance compared to DeepSeek 67B, saving 42.5% training costs and boosting generation by up to 5.76x.


Invited Research Presentations
Professional Service
Selected Societal Engagements
Awards
  • McCormick School of Engineering Fellowship, Northwestern, 2024
  • Baosteel Outstanding Student Award, 7/30000+, Renmin Univ. of China, 2023
  • First Class Academic Excellence Award (top 3% GPA), Renmin Univ. of China, 2021
  • Provincial First Prize, Contemporary Undergraduate Mathematical Contest in Modeling, 2021
  • Honorable Mention, Mathematical Contest in Modeling and Interdisciplinary Contest in Modeling, 2021
Misc
  • I like to work and chat with people from diverse backgrounds (🌈), which I believe is the key to true innovation. Feel free to reach out for an online chat (or in person if you are in Evanston / Chicago Area).
  • I love Sandbox games like Minecraft, Danmaku games like Touhou Project, and Music games like Love Live. I also loved to design and make RPG games when I was in primary school (with RMXP on WindowsXP).
  • My dream was to be a vlogger and I post videos on bilibili, including vlogs, game playing records and some parody videos.
  • Beyond Chinese and English, I’ve picked up some Japanese due to my childhood love for anime. My favorite Anime were ワンピース and Fate/stay night.
  • I grew up in Wuhan, China and studied at No. 1 Middle School @ CCNU . I'm truly grateful for my time there.

Website design from Jon Barron