News
-
🗓️ Oct 29, 2024 -
Releasing dump-to-GPT:
let GPT quickly read your entire codebase with one line of code!
A small but exciting start of AI-wrench,
a growing toolkit for efficient AI engineering.
-
🗓️ Sep 20, 2024 - Glad to announce that
ESFT
has been accepted to the EMNLP 2024 Main Conference! 🎉 Many thanks to all collaborators!
-
🗓️ Jul 4, 2024 -
Thrilled to introduce our latest project at DeepSeek,
Expert-Specialized Fine-Tuning (ESFT)
for efficient and effective LLM customization by leveraging the highly specialized Mixture-of-Experts (MoE)
architecture! 🤖✨
-
🗓️ Jun 2, 2024 - Grateful to be spotlighted by my alma mater RUC for my journey and achievements. (read blog)
- 🗓️ Feb 15, 2024 - Excited to join Northwestern as a PhD student! 🎓 Many thanks to my advisor Manling Li!
- 🗓️ Oct 19, 2023 - Honored to be awarded the
Baosteel Outstanding Student Award 2023 🏅 as the ONLY undergrad student among science and technology departments in RUC! Special thanks to
NLPIR lab! 🙏
- 🗓️ Jun 7, 2023 - Excited to share that I'll be joining
UIUC Blender Lab 🔬 this summer as a student researcher!
- 🗓️ Mar 15, 2023 - My talk on LARGE language models at
Capital of Statistics 📊 will take place at 7:00 PM Mar 17, 2023 BJT! Click
here for more details. (Update: slides, video)
- 🗓️ Jan 12, 2023 - I will give a talk on pre-trained models and their applications 📚 at 2:00 PM Jan 13, 2023 BJT at
Mingli College! For more information, click
here.
(Update: slides)
- 🗓️ Dec 12, 2022 - I posted an article introducing ChatGPT on
Capital of Statistics 💡. Do not miss it if you want to know more about ChatGPT! (link)
|
Research Interest
I work on various topics regarding Large Language Models, including interaction, alignment, and long-context understanding (retrieval).
My representative works include
(1) general interaction, e.g.,
MINT interaction benchmark,
(2) the cross-application of LLM & IR, e.g.,
retrieval augmented models (RetaLLM)
and LM-based IR,
(3) efficient alignment of LLMs, e.g.,
expert-specialized fine-tuning.
|
Selected Publications
See full list on Semantic Scholar (Why I Love Semantic Scholar, and You Might Too)
×
Semantic Scholar uses AI-powered tools to summarize papers, highlight key phrases, and rank research by influence.
This helps you find important studies faster.
Its Semantic Reader helps you understand papers with skimming highlights and citation cards .
You can also see how papers connect with citation graphs.
While Google Scholar is great for broad searches,
Semantic Scholar is smarter for finding high-quality and impactful research!
|
[New] Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models
Zihan Wang,
Deli Chen,
Damai Dai,
Runxin Xu,
Zhuoshu Li,
Yu Wu
EMNLP 2024
[paper]
[code]
We harness the Specialized Power of Experts in MoE LLMs through ESFT. By fine-tuning Down to 5% Experts in a layer, near-full performance can be achieved.
|
|
[Highlight] MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang*,
Zihan Wang*,
Jiateng Liu,
Yangyi Chen,
Lifan Yuan,
Hao Peng,
Heng Ji
ICLR 2024
[paper]
[website]
[code]
We introduce MINT, a benchmark for evaluating LLMs in Multi-turn Interactions with tools and language feedback.
MINT reveals several limitations in existing RLHF and SIFT methods on multi-turn interaction.
|
|
[Highlight] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeek AI (157 authors including Zihan Wang)
[paper]
[code]
DeepSeek-V2 is a strong MoE model with 23B activated parameters. It achieves stronger
performance compared to DeepSeek 67B, saving 42.5% training costs and boosting generation by up to 5.76x.
|
|
NOVO: Learnable and Interpretable Document Identifiers for Model Based IR
Zihan Wang,
Yujia Zhou,
Yiteng Tu,
Zhicheng Dou.
CIKM 2023, Oral Presentation
[paper]
[code]
We propose learnable NOVO document-IDs for model-based IR.
NOVO IDs consist of non-overlapping n-gram sets to identify documents,
optimized through denoising queries and retrieval tasks.
|
|
RetaLLM: A Retrieval-Augmented Large Language Model Toolkit
Jiongnan Liu,
Jiajie Jin,
Zihan Wang,
Jiehan Cheng,
Zhicheng Dou,
Ji-Rong Wen
[paper]
[code]
We develop a Retreival-Augmented LLM toolkit for better interaction between LLMs and retrieval systems.
Feature modules: request rewriting, passage extraction, and fact-checking.
|
Awards
- McCormick School of Engineering Fellowship, Northwestern, 2024
- Baosteel Outstanding Student Award, 7/30000+, Renmin Univ. of China, 2023
- First Class Academic Excellence Award (top 3% GPA), Renmin Univ. of China, 2021
- Provincal First Prize, Contemporary Undergraduate Mathematical Contest in
Modeling, 2021
- Honorable Mention, Mathematical Contest in Modeling and Interdisciplinary
Contest in Modeling, 2021
|
Invited Talks and Presentations
Professional Service
Misc
- I like to work and chat with people from diverse backgrounds (🌈), which I believe is the key to true innovation. Feel free to contact me.
- I love Sandbox games like Minecraft and Danmaku games like Touhou Project.
I also loved designing RPG games when I was in primary school (with
RMXP on WindowsXP), although
they cannot be launched anymore on Win10.
- My dream was to be a vlogger and I posted
videos
on bilibili, including vlogs, game playing records and some parody videos.
- Besides Chinese and English, I can speak a little Japanese due to my passion in Anime
in my childhood. My favorite Anime was ワンピース and Fate/stay night.
|
|