Efficient Implementation of Kyber on Mobile Devices

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

4 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 27th International Conference on Parallel and Distributed Systems
Subtitle of host publicationICPADS 2021
PublisherIEEE
Pages506-513
ISBN (Electronic)9781665408783
ISBN (Print)978-1-6654-0879-0
Publication statusPublished - Dec 2021

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN (Print)1521-9097
ISSN (Electronic)2690-5965

Conference

Title27th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2021)
LocationJiuhua International Convention and Exhibition Center Hotel
PlaceChina
CityBeijing
Period14 - 16 December 2021

Abstract

Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.

Research Area(s)

  • Kyber, Lattice-based Cryptography, Modular Reduction, Module-LWE, NTT

Citation Format(s)

Efficient Implementation of Kyber on Mobile Devices. / Zhao, Lirui; Zhang, Jipeng; Huang, Junhao et al.
Proceedings - 2021 IEEE 27th International Conference on Parallel and Distributed Systems: ICPADS 2021. IEEE, 2021. p. 506-513 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review