RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

68 Citations (Scopus)
90 Downloads (CityUHK Scholars)

Abstract

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository. While for automated code completion tools, it is difficult to utilize the useful information scattered in different files. We propose RepoCoder, a simple, generic, and effective framework to address the challenge. It streamlines the repository-level code completion process by incorporating a similarity-based retriever and a pre-trained code language model in an iterative retrieval-generation pipeline. RepoCoder makes effective utilization of repository-level information for code completion and has the ability to generate code at various levels of granularity. Moreover, we propose a new benchmark RepoBench, which consists of the latest and high-quality real-world repositories covering line, API invocation, and function body completion scenarios. Experimental results indicate that RepoCoder significantly improves the In-File completion baseline by over 10% in all settings and consistently outperforms the vanilla retrieval-augmented code completion approach. Furthermore, we validate the effectiveness of RepoCoder through comprehensive analysis, providing valuable insights for future research. ©2023 Association for Computational Linguistics.
Original languageEnglish
Title of host publicationProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
EditorsHouda Bouamor, Juan Pino, Kalika Bali
PublisherAssociation for Computational Linguistics
Pages2471–2484
ISBN (Print)9798891760608
DOIs
Publication statusPublished - Dec 2023
Event2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023) - Resorts World Convention Centre (Hybrid), Singapore
Duration: 6 Dec 202310 Dec 2023
https://aclanthology.org/2023.emnlp-main
https://2023.emnlp.org/

Publication series

NameEMNLP - Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
Abbreviated titleEMNLP
PlaceSingapore
Period6/12/2310/12/23
Internet address

Bibliographical note

Research Unit(s) information for this publication is provided by the author(s) concerned.

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation'. Together they form a unique fingerprint.

Cite this