Abstract
Continual Learning (CL) seeks to endow machine learning systems with the ability to learn sequentially from data streams while avoiding catastrophic forgetting. Despite significant progress in developing a wide range of CL techniques, including regularization-based, replay-based, and model-based methods, fundamental challenges remain unresolved. These include the need for better optimization strategies to preserve past knowledge, deeper understanding of the conceptual relationships across CL methods and related domains such as Multi-Task Learning (MTL), and the integration of CL with large-scale pre-trained models for real-world adaptability.This dissertation addresses these challenges through a series of theoretical and algorithmic contributions. First, we establish a theoretical link between regularization-based and meta-continual learning (Meta-CL) by analyzing their implicit Hessian approximations. We identify critical limitations in both approaches and propose VR-MCL, a variance-reduced Meta-CL framework that improves optimization stability by mitigating noisy curvature estimates. Supported by theoretical regret bounds and empirical validation, VR-MCL demonstrates strong performance in online CL settings.
Second, we investigate the connection between CL and MTL through the lens of gradient alignment. We reveal that gradient-aligned CL methods are instances of hierarchical multi-objective optimization, where the current task is prioritized. Leveraging this insight, we introduce the Pareto-Optimized Continual Learning (POCL) algorithm, which incorporates inter-task relationships via Pareto optimization. POCL achieves superior knowledge integration and significantly reduces forgetting, consistently outperforming state-of-the-art baselines.
Third, we propose the Continual Bias Adaptor (CBA), a lightweight plug-in module designed to handle posterior distribution shifts in dynamic data streams. CBA enhances classifier adaptability without adding inference overhead, and theoretical analysis confirms its ability to mitigate forgetting. Integrated into diverse rehearsal-based CL methods, CBA reliably boosts performance across tasks and datasets.
Beyond these algorithmic advancements, this thesis explores how continual learning can be effectively adapted to foundation models. For foundation vision models, we introduce SD-LoRA, a scalable and rehearsal-free method that decouples magnitude and direction in parameter updates to achieve efficient class-incremental learning. For large language models, we propose PS-LoRA, which stabilizes learning by penalizing harmful sign-flipping updates and merging impactful parameter changes after training.
Collectively, these contributions provide a unified, theoretically grounded, and practically effective framework for continual learning, with implications for deploying adaptive, intelligent systems in ever-evolving environments.
| Date of Award | 29 Jul 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Kede MA (Supervisor) & Ying Wei (External Co-Supervisor) |
Cite this
- Standard