Towards Advanced Evolutionary Neural Architecture Search

面向先進進化神經網絡架構搜索的研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
  • Ka Chun WONG (Supervisor)
  • Kay Chen Tan (External person) (External Co-Supervisor)
Award date3 May 2023

Abstract

Neural architecture search (NAS) aims to automatically construct deep neural network (DNN) model architectures suitable for specific tasks. Evolutionary NAS (ENAS) is a widely-applied, mainstream NAS method that has proven to be effective in designing various DNN models that outperform hand-crafted ones in various tasks. In this thesis, we take an in-depth look at ENAS methods and propose advanced techniques from three different perspectives. To summarize, our contributions entail:

Firstly, automated construction of deep neural networks (DNNs) has emerged as a research hotspot, attracting considerable attention from both academia and industry. In this study, we formulate the automated DNN construction process as a multi-level, multi-objective, and large-scale optimization problem with various constraints, characterized by its non-convex, non-differentiable, and black-box nature, which highlights the potential of evolutionary algorithms (EAs) for effective and efficient solutions. Drawing upon extant literature, we systematically review existing EA-based approaches to construct DNNs, analyze the merits and limitations of utilizing EA-based methods in different stages of DNN construction. This work aims to help DNN researchers better understand why, where, and how to utilize EAs for automated DNN construction and meanwhile help EA researchers better understand the task of automated DNN construction so that they may focus more on EA-favored optimization scenarios to devise more effective techniques.

Secondly, the traditional ENAS methods require massive candidate architectures to be trained to find the target architecture from the vast search space, which is a time-consuming process. To address this issue, we aim to focus on exploring the most promising regions of the search space, rather than the entire space. To achieve this goal, we propose HierNAS, a hierarchical NAS method comprising two stages: low-fidelity global exploration and high-fidelity local exploitation. In the first stage, we employ an ENAS method to evolve multiple populations, where the performance value of each candidate architecture is predicted using a regression model, to identify promising architectures quickly. In the second stage, we conduct a high-fidelity local search surrounding the most promising architecture identified in the first stage to locate better-performing architectures. Additionally, the search results can be used to promote the global search. By iterating these two stages, we can find the target architectures. The experimental results demonstrate that HierNAS is more effective in finding the target model architectures than traditional NAS methods.

Existing ENAS methods design the architecture for each task from scratch, which is inefficient for multiple tasks. To address this issue, we propose a novel multi-task NAS framework called MTNAS, which simultaneously finds architectures for multiple tasks and shares valuable architecture knowledge among them to improve overall search efficiency. MTNAS conducts a unique NAS for each task in an iterative manner, with two types of operations: task-specific evolution and cross-task transfer. Task-specific evolution involves evolving a population by an ENAS method to find promising architectures, which are then stored in a shared archive. Cross-task transfer exploits the most promising architecture from the archive to promote the architecture search for the target task. To alleviate the negative effect of unhelpful knowledge transfer, we propose a set of knowledge transfer strategies, including transfer probability adaptation, knowledge extraction, and knowledge reuse. Experimental results on NAS-Bench-201, TransNAS-Bench-101, and the search space proposed by DARTs demonstrate that MTNAS is at least 2x faster than existing NAS methods.

Lastly, NAS not only requires significant computational resources but also a considerable amount of data for the specific task. Thus, when deploying the NAS technique for specific DNNs in practical scenarios, the high deployment cost can be an issue. To solve this problem, we proposed a category-specific but task-agnostic ENAS method, called CSTA-ENAS. This approach utilizes datasets from multiple other tasks in the same category to design a transferable DNN model. This model requires a lightweight fine-tuning process that involves only a few epochs and data samples to solve a new task satisfactorily. In this way, our transferable model can be constructed on a high-performance server with publicly available data, and subsequently, be deployed in resource-limited scenarios at a low cost. To demonstrate the effectiveness of CSTA-ENAS, we build transferable models on a set of image classification datasets and evaluate their performance on other unseen image classification tasks. Our models achieve comparable performance to most existing task-specific and transferable architectures but with lower deployment costs.