Optimizing Deep Learning Models through Diversity and Differentiation
通過多樣性和差異性優化深度學習模型
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 10 Aug 2021 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(f36c24a7-c5d9-4a79-8743-1658ec67da13).html |
---|---|
Other link(s) | Links |
Abstract
Deep learning (DL) models have been deployed in many application domains. Developing a DL model requires training the model, testing it with unseen datasets, and deploying it for production. However, it is slow to train and converge a model, and a trained model still has opportunities for further optimizations.
Existing training algorithms usually optimize a DL model by feeding training batches iteratively. However, due to different losses incurred by different training batches, the model could swing among those training sets, making the model difficult to converge. These trained DL models may incur the overfitting problem. Furthermore, the performance of trained models could be underestimated. The quantized model could be less robust than its full-precision counterpart.
This thesis presents a novel framework to optimize DL models through diversity and differentiation to address these problems. It makes three major contributions.
The first contribution is to formulate DeepEutaxy to address the inefficient convergence problem. DeepEutaxy is the first work to prioritize training batches from the perspective of gradient diversity. It periodically measures the gradients of batches with respect to the current intermediate model and prioritizes batches so that the current model subsequently optimizes against batches with larger diversity in gradients first. DeepEutaxy innovatively explores behaviors of the DL model exhibiting over individual batches.
The second contribution is to formulate Apricot and Flint to address the overfitting problem. Apricot is the first work in optimizing a trained model through direct weight adjustment at the model level, and Flint is the first to fix DL models by selective patch generation and retraining at the sample level.
Apricot generates a set of models that are resembled to a trained DL model. It modifies the model by direct weight adjustment towards these resembled models that classify each failed test sample correctly and away from the remaining resembled models for that sample. Apricot innovatively exposes that these resembled models can provide useful insights for optimizing the model.
Flint proposes a non-intrusive and novel approach to fixing DL models by exploring consistency and correctness across models in a population. Flint first localizes training samples with the highest historical correctness to improve test accuracy with confidence, followed by new sample generation around these samples and brief model retraining.
The third contribution is to formulate Gamar and Croissant to improve the robustness of trained models. Gamar formulates a novel approach to localizing model layers more sensitive and generating mutated gradients after precise feature alignments between the matching pairs of training sample and its perturbed sample to provide search directions for model evolution. Croissant is the first work that measures the cause-and-effect between quantization error and the quantization implemented in specific layers. It minimizes the difference between the quantized block and the full-precision block to reduce quantization errors.
In summary, this thesis makes three major contributions to optimize DL models from model convergence to model accuracy and robustness by presenting the first work that explores and formulate novel notions of diversity and differentiation from both the model and the sample perspectives.
Existing training algorithms usually optimize a DL model by feeding training batches iteratively. However, due to different losses incurred by different training batches, the model could swing among those training sets, making the model difficult to converge. These trained DL models may incur the overfitting problem. Furthermore, the performance of trained models could be underestimated. The quantized model could be less robust than its full-precision counterpart.
This thesis presents a novel framework to optimize DL models through diversity and differentiation to address these problems. It makes three major contributions.
The first contribution is to formulate DeepEutaxy to address the inefficient convergence problem. DeepEutaxy is the first work to prioritize training batches from the perspective of gradient diversity. It periodically measures the gradients of batches with respect to the current intermediate model and prioritizes batches so that the current model subsequently optimizes against batches with larger diversity in gradients first. DeepEutaxy innovatively explores behaviors of the DL model exhibiting over individual batches.
The second contribution is to formulate Apricot and Flint to address the overfitting problem. Apricot is the first work in optimizing a trained model through direct weight adjustment at the model level, and Flint is the first to fix DL models by selective patch generation and retraining at the sample level.
Apricot generates a set of models that are resembled to a trained DL model. It modifies the model by direct weight adjustment towards these resembled models that classify each failed test sample correctly and away from the remaining resembled models for that sample. Apricot innovatively exposes that these resembled models can provide useful insights for optimizing the model.
Flint proposes a non-intrusive and novel approach to fixing DL models by exploring consistency and correctness across models in a population. Flint first localizes training samples with the highest historical correctness to improve test accuracy with confidence, followed by new sample generation around these samples and brief model retraining.
The third contribution is to formulate Gamar and Croissant to improve the robustness of trained models. Gamar formulates a novel approach to localizing model layers more sensitive and generating mutated gradients after precise feature alignments between the matching pairs of training sample and its perturbed sample to provide search directions for model evolution. Croissant is the first work that measures the cause-and-effect between quantization error and the quantization implemented in specific layers. It minimizes the difference between the quantized block and the full-precision block to reduce quantization errors.
In summary, this thesis makes three major contributions to optimize DL models from model convergence to model accuracy and robustness by presenting the first work that explores and formulate novel notions of diversity and differentiation from both the model and the sample perspectives.