About

I am a CS Ph.D. student at the University of Illinois Urbana-Champaign in the PL/FM/SE group, advised by Prof. Lingming Zhang. You can find my CV here.

My research fields are Software Engineering and Machine Learning. Specifically, I focus on building large language models to solve software engineering tasks, with a specific interest in improving LLM’s reasoning and planning capabilities for code generation and repair through pre-training and post-training.

I obtained my bachelor’s degrees at Tsinghua University, including one in Software Engineering from the School of Software and one in Business Administration from the School of Economics and Management. I was a research assistant at Software System Security Assurance Group during my undergraduate years, advised by Prof. Yu Jiang.

Selected Publications (See full list on Google Scholar)

  • Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning
    Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, and Zijian Wang
    arXiv; NeurIPS 2024 System 2 Reasoning Workshop. [preprint] [summary]
  • XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
    Yifeng Ding, Jiawei Liu, Yuxiang Wei, and Lingming Zhang
    62nd Annual Meeting of the Association for Computational Linguistics
    (ACL 2024), Pages 12941–-12955, August 2024. [paper] [summary] [code]
  • SelfCodeAlign: Self-Alignment for Code Generation
    Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Zachary Mueller, Harm de Vries, Leandro Von Werra, Arjun Guha, and Lingming Zhang
    Thirty-eighth Conference on Neural Information Processing Systems
    (NeurIPS 2024), To appear, December 2024. [preprint] [code] [model] [dataset]
  • Magicoder: Empowering Code Generation with OSS-Instruct
    Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang
    Forty-first International Conference on Machine Learning
    (ICML 2024), Pages 52632–52657, July 2024. [paper] [code] [model] [dataset]
  • Evaluating Language Models for Efficient Code Generation
    Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding, and Lingming Zhang
    First Conference on Language Modeling
    (COLM 2024), October 2024. [paper] [code]
  • RepoQA: Evaluating Long Context Code Understanding
    Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, Lingming Zhang
    First Workshop on Long-Context Foundation Models @ ICML 2024
    (LCFM @ ICML 2024), July 2024. [paper] [code]
  • The Plastic Surgery Hypothesis in the Era of Large Language Models
    Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang
    38th IEEE/ACM International Conference on Automated Software Engineering
    (ASE 2023), Pages 522–534, September 2023. [paper]

Academic Services

  • Reviewer: AISTATS’25, ICLR’25, NeurIPS’24, ACL’24, EMNLP’24, NAACL’25
  • Organizing Committee: LLM4Code@ICSE’25 (International Workshop on Large Language Models for Code, co-organized with ICSE’25), LLM4Code@ICSE’24

Invited Talks

  • CAMEL-AI.org (11/2024): “Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning
  • UCLA AGI Lab (11/2024): “Improving Code Language Modeling via Horizon-Length Prediction
  • UIUC FM/SE Seminar (10/2024): “Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning
  • Amazon Comprehend Team (07/2024): “XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
  • Uber Programming Systems Team (04/2023): “Equipping Large Language Models with Domain-Specific Knowledge for Automated Program Repair

Contacts

  • Email: yifeng6@illinois.edu
  • Address: 2107 Thomas M. Siebel Center for Computer Science