Dynamic Classes Adaptation in Constrained Vision Models

Student thesis: Doctoral Thesis

Abstract

The rapid progress of deep learning has driven extensive research on enhancing vision models to generalize to novel tasks and unseen classes with limited data. While deep learning typically relies on large-scale labeled datasets, this dependence restricts model applicability in real-world scenarios where data is scarce. Even when models are trained on abundant samples, they often excel at recognizing known classes but struggle to adapt to incremental tasks (e.g., continual classification, continual semantic segmentation) or generalize to unseen classes. Furthermore, when adapting to new tasks or classes, models trained solely on new data often experience catastrophic forgetting, leading to significant performance drops in previously learned classes, or they exhibit poor generalization without additional training. To address these challenges, this thesis explores continual learning for both classification and segmentation tasks, aiming to mitigate catastrophic forgetting, and investigates few-shot generalization to novel classes using limited labeled data. By proposing multi-faceted adaptation strategies, this work enables models to retain performance across all previously learned classes while effectively adapting to new tasks.

First, we investigate techniques for continual image classification. A major problem in feature decomposition methods only focus on individual tasks feature decomposition while neglecting the crucial information provided by relationships between different tasks, thereby limiting performance improvement. To address this issue, we propose an Adversarial Contrastive Continuous Learning (ACCL) method that decouples task-invariant and task-variant features by constructing all-round, multi-level contrasts on sample pairs within individual tasks or from different tasks. Specifically, three constraints on the distribution of task-invariant and task-variant features are included, ie, task-invariant features across different tasks should remain consistent, task-variant features should exhibit differences, and task-invariant and task-variant features should differ from each other. At the same time, we also design an effective contrastive replay strategy to make full use of the replay samples to participate in the construction of sample pairs, further alleviating the forgetting problem, and modeling cross-task relationships. Extensive experimental evaluations demonstrate that the ACCL framework achieves state-of-the-art performance metrics, enhances task adaptability, and substantially reduces catastrophic forgetting compared to existing methods.

Next, we extend our research to a more challenging task, the continual semantic segmentation task. The issue of background shift in replay samples, where partial annotations can lead to forgetting previous classes and hinder learning new ones, restricting the model flexibility. To resolve this, we introduce a new method named Trace Back and Go Ahead (TAGA), which utilizes a backward annotator model and a forward annotator model to generate pseudo-labels for both regular training samples and exemplars, aiming at reducing the adverse effects of incomplete annotations. This approach effectively mitigates the risk of incorrect guidance from both sample types, offering a comprehensive solution to background shift. Additionally, due to a significantly smaller number of replay samples compared to regular training samples, the class distribution in the sample pool of each incremental task exhibits a long-tailed pattern, potentially biasing classification towards incremental classes. Consequently, we incorporate a class-equilibrium sampling strategy that adaptively adjusts the sampling frequencies based on the ratios of replay samples to regular samples and past to new classes, counteracting the skewed distribution. Experimental results on continual semantic segmentation tasks confirm that the TAGA framework achieves state-of-the-art performance, effectively mitigates background shift, and demonstrates superior task adaptability.

Finally, we explore the model generalization to unseen classes in the context of few-shot segmentation. Current approaches either lose crucial spatial contextual information by relying on generalized class representations or overemphasize spatial affinity without effectively summarizing the core target class information. This results in poor fine detail accuracy or errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: the Concept Perception Generation module leverages pre-trained category perception capabilities to capture high-quality core representations of the target class, the Concept-Feature Integration module injects the core class information into both support and query features during feature extraction, and the Contextual Distribution Mining module utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Extensive experiments on few-shot semantic segmentation tasks validate that the CCFormer framework achieves state-of-the-art performance, accurately captures both core-semantic and spatial information, and demonstrates superior generalization to unseen classes compared to existing methods.
Date of Award15 Aug 2025
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorHo Shing Horace IP (Supervisor) & Tak Wu Sam KWONG (External Co-Supervisor)

Cite this

'