TY - GEN
T1 - Hyperspherical Quantization
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023)
AU - Liu, Dan
AU - Chen, Xi
AU - Ma, Chen
AU - Liu, Xue
N1 - Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).
PY - 2023
Y1 - 2023
N2 - Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32×, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels (~30×, ~40×), our method significantly improves the test accuracy and reduces the model size. © 2023 IEEE.
AB - Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32×, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels (~30×, ~40×), our method significantly improves the test accuracy and reduces the model size. © 2023 IEEE.
KW - Algorithms: Machine learning architectures
KW - and algorithms (including transfer)
KW - formulations
KW - Image recognition and understanding (object detection, categorization, segmentation, scene modeling, visual reasoning)
UR - https://www.scopus.com/pages/publications/85148998021
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85148998021&origin=recordpage
U2 - 10.1109/WACV56688.2023.00523
DO - 10.1109/WACV56688.2023.00523
M3 - RGC 32 - Refereed conference paper (with host publication)
T3 - Proceedings - IEEE Winter Conference on Applications of Computer Vision, WACV
SP - 5251
EP - 5261
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision (WACV 2023)
PB - IEEE
Y2 - 3 January 2023 through 7 January 2023
ER -