Qianyu He

Ph.D. Candidate

Fudan University

Biography

Qianyu He (何千羽) is currently a fourth-year PhD candidate at Fudan University in the School of Computer Science. Her research interests primarily focus on enhancing the fundamental reasoning and instruction following capabilities of large language models (LLMs):

Reasoning Model: Advancing research on incentivizing and understanding LLMs’ complex reasoning abilities.
Instruction Following: Developing advanced methods for LLMs to follow complex instructions, ensuring more reliable human-LLMs interactions, and empowering autonomous completion of complex real-world tasks.

🌟 I am currently on the job market! I am actively seeking opportunities in research and industry positions related to large language models, reasoning, and instruction following. If you are interested in collaboration or have available positions, please feel free to contact me. For more details, please refer to my Resume.

Interests

Reasoning Model
Instruction Following
Dancing 💃

Education

PhD in CS, 2021-2026 (estimated)
Fudan University
B.S. in CS, 2017-2021
Fudan University

Experience

Research Intern

ByteDance Seed-LLM-Horizon

November 2024 – Present Shanghai, China

Topics: Reasoning Model, Long Chain-of-thought.

Projects: Seed-Thinking-v1.5, Doubao-1.5-pro-AS1-Preview

Research Intern

StepFun Foundation Model Group

May 2024 – October 2024 Shanghai, China

Topics: LLMs Reasoning, Generative Reward Model

Student Research Leader

Knowledge Works Lab, at Fudan University

March 2021 – Present Shanghai, China

Topics: Instruction Following, LLMs Reasoning, Creative Generation. Mentored > 10 undergraduate and graduate students over the years

News

May. 2025 🎉 Checkout Enigmata, the first comprehensive suite tailored for improving LLMs with puzzle reasoning skills.
May. 2025 🎉 Checkout KORGym, A Dynamic Game Platform for LLM Reasoning Evaluation.
May. 2025 🎉 Two papers about how to enhance instruction following have been accepted by ACL 2025 findings! The first paper enhances the soft contraint following ability of LLMs and the second paper investigated the position bias in multi-constraint instruction following.
Apr. 2025 🎉 We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks.
Mar. 2025 🎉 Our Instruction Following Benchmark CELLO was used by Hunyuan-Thinker-1-Preview for instruction following evaluation.
Jan. 2025 🎉 Congratulations on our paper Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models accepted by ICLR 2025!
Nov. 2024 👀 Joined ByteDance Seed-LLM-Horizon as a research intern to work on reasoning models.
Aug. 2024 🎉 How to improve LLMs’ ability to follow Complex Instructions? Congratulations on our paper From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models got accepted to EMNLP 2024 findings!
May. 2024 👀 Joined StepFun Foundation Model Group as a research intern to work on LLM reasoning research.
May. 2024 🔔 Gave a talk at Alibaba Tongyi Lab, titled: “Complex Instruction Following Ability of Large Language Models”. Thanks for the invitation!

Awards

First-Class Academic Scholarship for Doctoral Students

Fudan University Mar 2025

Intel Fellowship

Intel Oct 2023

Venustech Scholarship

Fudan University Dec 2022

Outstanding Graduates in Shanghai

Fudan University Jun 2021

China National Scholarship

Fudan University Oct 2020

Featured Publications

Jiangjie Chen, Qianyu He, Siyu Yuan, Aili Chen, Zhicheng Cai, Weinan Dai, Hongli Yu, Qiying Yu, Xuefeng Li, Jiaze Chen, Hao Zhou, Mingxuan Wang

May, 2025 Preprint

Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance, training efficiency, and generalization abilities under four settings.

ByteDance Seed

April, 2025 Preprint 2025

Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning

We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research.

Qingyu Ren, Jie Zeng, Qianyu He, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu

March, 2025 In ACL 2025 Findings

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. However, it is an unexplored area to enhance LLMs’ ability to follow soft constraints. To bridge the gap, we initially design a pipeline to construct datasets with high-quality outputs automatically. Additionally, to fully utilize the positive and negative samples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs’ soft constraint following ability and analyze the factors driving the improvements.

Qianyu He, Jie Zeng, Qianxi He, Jiaqing Liang, Yanghua Xiao

April, 2024 In Conference on Empirical Methods in Natural Language Processing (EMNLP) 2024 Findings

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

It is imperative for Large language models (LLMs) to follow instructions with elaborate requirements (i.e. Complex Instructions Following). Yet, it remains under-explored how to enhance the ability of LLMs to follow complex instructions with multiple constraints. To bridge the gap, we initially study what training data is effective in enhancing complex constraints following abilities. We found that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions, especially those with lower complexity levels. The improvement can even generalize to compositions of out-of-domain constraints. Additionally, we further propose methods addressing how to obtain and utilize the effective training data. Finally, we conduct extensive experiments to prove the effectiveness of our methods in terms of overall performance, training efficiency, and generalization abilities under four settings.

Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang, Haoning Ye, Zihan Li, Shisong Chen, Yikai Zhang, Zhouhong Gu, Jiaqing Liang, Yanghua Xiao

December, 2023 In The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

Can Large Language Models Understand Real-World Complex Instructions?

Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs’ ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose \method, a benchmark for evaluating LLMs’ ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments.