Fully Nested Neural Network for Adaptive Compression and Quantization

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review

View graph of relations

Detail(s)

Original languageEnglish
Title of host publicationProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
EditorsChristian Bessiere
PublisherInternational Joint Conferences on Artificial Intelligence
Pages2080-2087
ISBN (Electronic)978-0-9992411-6-5
Publication statusPublished - Jan 2021

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2021-January
ISSN (Print)1045-0823

Conference

Title29th International Joint Conference on Artificial Intelligence (IJCAI 2020)
LocationOnline
PlaceJapan
CityYokohama
Period7 - 15 January 2021

Abstract

Neural network compression and quantization are important tasks for fitting state-of-the-art models into the computational, memory and power constraints of mobile devices and embedded hardware. Recent approaches to model compression/quantization are based on reinforcement learning or search methods to quantize the neural network for a specific hardware platform. However, these methods require multiple runs to compress/quantize the same base neural network to different hardware setups. In this work, we propose a fully nested neural network (FN3) that runs only once to build a nested set of compressed/quantized models, which is optimal for different resource constraints. Specifically, we exploit the additive characteristic in different levels of building blocks in neural network and propose an ordered dropout (ODO) operation that ranks the building blocks. Given a trained FN3, a fast heuristic search algorithm is run offline to find the optimal removal of components to maximize the accuracy under different constraints. Compared with the related works on adaptive neural network designed only for channels or bits, the proposed approach is applicable to different levels of building blocks (bits, neurons, channels, residual paths and layers). Empirical results validate strong practical performance of proposed approach.

Citation Format(s)

Fully Nested Neural Network for Adaptive Compression and Quantization. / Cui, Yufei; Liu, Ziquan; Yao, Wuguannan; Li, Qiao; Chan, Antoni B.; Kuo, Tei-wei; Xue, Chun Jason.

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. ed. / Christian Bessiere. International Joint Conferences on Artificial Intelligence, 2021. p. 2080-2087 (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2021-January).

Research output: Chapters, Conference Papers, Creative and Literary Works (RGC: 12, 32, 41, 45)32_Refereed conference paper (with ISBN/ISSN)peer-review