Shanghaoran Quan

My research focuses on LLM Alignment and I was previously involved in an internship at Alibaba, Qwen Team, advised by Bowen Yu and An Yang. I also enjoyed my previous experiences participating in Informatics and Mathematics competitions. Now I am a senior student in School of Computer Science and Engineering, Beihang University, and I will pursue a master's degree in NLP at Wangxuan Institute of Computer Technology, Peking University starting in Fall 2025, advised by Dongyan Zhao and Huishuai Zhang.

Feel free to email me at quanshr2023@gmail.com for any form of academic cooperation!
More info: Google Scholar / Github

Selected Papers

	CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Shanghaoran Quan, Jiaxi Yang, Bowen Yu, ..., Binyuan Hui, Junyang Lin Preprint [Paper] [WebPage] [Dataset] We introduce CodeElo, a new competition-level code generation benchmark that provides human-comparable Elo ratings of LLMs for the first time.
	Qwen2.5 Technical Report Qwen Team Technical Report [Collection] [Paper] [Code] In this report, we introduce Qwen2.5, a comprehensive series of LLMs designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages.
	Language Models can Self-Lengthen to Generate Long Texts Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin Preprint [Paper] [Code] Self-Lengthen is a novel and effective data-driven technique for extrapolating long output, designed to stimulate long-generation ability from scratch using only the LLM's intrinsic knowledge and skills.
	Qwen2.5-Coder Technical Report Qwen Team Technical Report [Collection] [Paper] [Code] Qwen2.5-Coder series includes six models: 0.5B/1.5B/3B/7B/14B/32B and achieves state-of-the-art performance across more than 10 benchmarks, demonstrating impressive code generation capabilities.
	Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity Shanghaoran Quan AAAI, 2025 \|\| AFM@NeurIPS, 2024 [Paper] [Code] We propose a novel and effective method to iteratively split context and derive high-quality query-response pairs for domain-specific SFT data.
	DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling Shanghaoran Quan ACL Findings, 2024 [Paper] [Code] We propose a novel and effective reward modeling method based on MoE, which decomposes the inputs into different capability points under different tasks.

Other Papers

Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao, Feifan Song, Yibo Miao, ..., Shanghaoran Quan, ...
Preprint
[Paper] [Repo]

Aligning CodeLLMs with Direct Preference Optimization
Yibo Miao, Bofei Gao, Shanghaoran Quan, Junyang Lin, Daoguang Zan, Jiaheng Liu, Jian Yang, Tianyu Liu, Zhijie Deng
Preprint
[Paper]

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Bofei Gao, Feifan Song, Zhe Yang, ..., Shanghaoran Quan, ...
Preprint
[WebPage] [Paper]

Experience

	Alibaba DAMO Academy, Qwen Team Jul 2024 - Feb 2025 Research Intern Advisor: Bowen Yu and An Yang
	Baidu NLP Group, ERNIE-Bot Team Nov 2023 - May 2024 Research Intern Advisor: Jun Xu
	Beihang University, School of Computer Science and Engineering Sep 2021 - Present B.Eng. Student

Honor

Three Gold medals in ICPC Asia regional contests (highest rank: 5th) and two Gold medals in CCPC regional contests
Undergraduate National Scholarship (rank: 2/1003)
First prize in China National Olympiad in Informatics in Provinces
First prize in China National Olympiad in Mathematics in Provinces

Academic Service

Conference Reviewer: KDD 2024, ICLR 2025

Updated on Mar 26, 2025.