Abstract
While large language models (LLMs) excel at code generation, translating abstract descriptions into robust and functional code remains a significant challenge. Despite dedicated efforts, existing works for refining code generation with LLMs have demonstrated limitations, either constrained by the static rules or computational overhead of additional training, ultimately proving insufficient to meet the intricate demands of real-world code quality. This paper proposes a method to improve code generation ability with LLM by combining reinforcement learning from human feedback (RLHF) with crowd-sourced computation, referred to as cRLHF. Our goal is to enhance code quality through diverse end-user feedback. Traditional RLHF, relying on a single evaluator, risks biases and overlooks insights, hampering LLMs' growth. The cRLHF framework, powered by Bayesian inference, ensures objective code evaluation from multiple evaluators. Our experiments exhibit significant improvements in code correctness, showcasing the efficacy of crowd-sourcing with reinforcement learning. © 2024 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2024 IEEE Conference on Artificial Intelligence |
| Subtitle of host publication | CAI 2024 |
| Publisher | IEEE |
| Pages | 158-163 |
| ISBN (Electronic) | 979-8-3503-5409-6 |
| ISBN (Print) | 979-8-3503-5410-2 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2nd IEEE Conference on Artificial Intelligence (IEEE CAI 2024) - Marina Bay Sands, Singapore Duration: 25 Jun 2024 → 27 Jun 2024 https://ieeecai.org/2024/ |
Publication series
| Name | Proceedings - IEEE Conference on Artificial Intelligence, CAI |
|---|
Conference
| Conference | 2nd IEEE Conference on Artificial Intelligence (IEEE CAI 2024) |
|---|---|
| Abbreviated title | CAI 2024 |
| Place | Singapore |
| Period | 25/06/24 → 27/06/24 |
| Internet address |
Bibliographical note
Research Unit(s) information for this publication is provided by the author(s) concerned.Research Keywords
- AI alignment
- Bayesian analysis
- Code generation
- Inductive bias
- Reinforcement learning