Abstract
Low-rank adaptation (LoRA) has emerged as a leading parameter-efficient fine-tuning technique for adapting large foundation models, yet it often locks adapters into suboptimal minima near their initialization. This hampers model generalization and limits downstream operators such as adapter merging and pruning. Here, we propose CoTo, a progressive training strategy that gradually increases adapters' activation probability over the course of fine-tuning. By stochastically deactivating adapters, CoTo encourages more balanced optimization and broader exploration of the loss landscape. We provide a theoretical analysis showing that CoTo promotes layer-wise dropout stability and linear mode connectivity, and we adopt a cooperative-game approach to quantify each adapter's marginal contribution. Extensive experiments demonstrate that CoTo consistently boosts single-task performance, enhances multi-task merging accuracy, improves pruning robustness, and reduces training overhead, all while remaining compatible with diverse LoRA variants. Code is available at https://github.com/zwebzone/coto.
Original language | English |
---|---|
Title of host publication | Proceedings of the 42nd International Conference on Machine Learning |
Publication status | Accepted/In press/Filed - 1 May 2025 |
Event | 42nd International Conference on Machine Learning, ICML 2025 - Vancouver Convention Center, Vancouver, Canada Duration: 13 Jul 2025 → 19 Jul 2025 https://icml.cc/Conferences/2025 |
Conference
Conference | 42nd International Conference on Machine Learning, ICML 2025 |
---|---|
Abbreviated title | ICML 2025 |
Country/Territory | Canada |
City | Vancouver |
Period | 13/07/25 → 19/07/25 |
Internet address |
Bibliographical note
Research Unit(s) information for this publication is provided by the author(s) concerned. Since this conference is yet to commence, the information for this record is subject to revision.Research Keywords
- Parameter-efficient fine-tuning
- linear mode connectivity
- low-rank adaptation
- model merging