Abstract
Humans possess a natural ability for continual learning, enabling them to acquire new skills without forgetting previously learned ones. In contrast, deep learning models struggle to achieve this capability. Expanding a trained model’s capability typically requires combining both original and new data and retraining from scratch. This is because fine-tuning on new data alone often leads to catastrophic forgetting of prior knowledge. However, frequent retraining is time-consuming and resource-intensive. To address these limitations, this thesis explores continual learning techniques for deep learning models across various visual understanding scenarios. These approaches enable models to progressively learn diverse visual tasks like humans, allowing them to evolve in response to changing environments or requirements.First, we explore continual learning techniques for semantic segmentation models. A major challenge in this domain is the differing class distributions across training samples of successive tasks, which introduces bias and degrades model performance. To address this, we propose two novel techniques: prototype replay and background pixel repetition. The prototype replay method constructs prototypes for old classes by leveraging the statistical information of their feature distributions and replays these prototypes during subsequent training, ensuring the stability of foreground class distributions across tasks. The background pixel repetition technique repeatedly incorporates background features during training on each task, thereby correcting background proportions. Additionally, we introduce an old-class feature-maintaining loss to stabilize the feature space of past classes and a similarity-aware discriminative loss to aid in distinguishing similar old and new classes. Experimental results demonstrate that our proposed method, STAR, achieves state-of-the-art performance while significantly reducing or even eliminating storage requirements compared to existing approaches.
Next, we address the more challenging task of continual learning for panoptic segmentation, emphasizing three key balances. First, past-class backtrace distillation balances retaining prior knowledge with acquiring new knowledge by backtracking features associated with output segments of past classes and applying targeted constraints. Second, the class-proportional memory strategy ensures balanced class representation of replay samples, prioritizing the recall of challenging and crucial old classes. Third, balanced anti-misguidance losses mitigate the negative impact of incomplete annotations in replay samples while avoiding foreground-background imbalance. Together, these strategies form our BalConpas method. Experimental results confirm that BalConpas not only achieves state-of-the-art performance in continual learning for panoptic segmentation but also performs competitively across other continual image segmentation tasks.
Finally, we extend our exploration of continual learning to multimodal large language models (MLLMs). In this study, we find that catastrophic forgetting in MLLMs cannot be simply classified as the forgetting of old knowledge, as is common in segmentation models. Instead, it should be divided into two categories: ``superficial forgetting" and ``essential forgetting". Superficial forgetting refers to the loss of response style, where the model fails to generate answers in the required format, rendering the response unusable. Essential forgetting, on the other hand, represents the genuine loss of knowledge. To tackle these challenges, we first propose the answer style diversification paradigm, which converts single-style Q\&A training samples from each task into multiple formats, preventing superficial forgetting caused by style shifts. Subsequently, we introduce RegLoRA, which selectively stabilizes key parameters related to old knowledge, thereby preventing essential forgetting. Experimental results demonstrate the effectiveness of these techniques and highlight the superior performance of the SEFE method that combines them.
| Date of Award | 24 Apr 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Ho Shing Horace IP (Supervisor) & Tak Wu Sam KWONG (External Co-Supervisor) |