Spectral arithmetic and architecture for modular arithmetic in public-key cryptosystems

公鑰密碼系統中模運算的頻域算術與硬體架構研究

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Donglong CHEN

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date2 Oct 2015

Abstract

Modular multiplication and polynomial multiplication are the basic and the most computationally intensive modular arithmetics in a majority of the public-key cryptosystems. The efficiency of the these modular arithmetics plays a crucial role in the performance of the corresponding cryptosystems. Using the schoolbook technique to compute these multiplications requires a quadratic complexity of O(l2) (l is the bit length of the operands). Spectral arithmetic, which enables the multiplication to be performed in spectral (frequency) domain, can reduce the multiplication complexity to a linearithmic complexity of O(l log l) theoretically. Though owning the advantage of low complexity, the usage of spectral domain property requires frequent data field transforms between time and spectral domains by using the Number Theoretic Transform (NTT). This makes the NTT transform become the performance bottleneck of these modular arithmetics. Therefore, how to improve the NTT computation and then the spectral arithmetic become an important issue for the performance improvements in these modular operations. In this thesis, arithmetic level and hardware architecture level improvements for these spectral arithmetics are proposed. Firstly, arithmetic level analysis and improvements for both Montgomery Modular Multiplication (MMM) and polynomial multiplication are provided. For MMM, the computation is performed by exploiting the spectral arithmetics in multiplication and addition, and leaving the remaining operations in time domain. We denote it as FFT-based Montgomery Modular Multiplication (FFTM3). In terms of the improvements for FFTM3, carry-save arithmetic are used to parallel the accumulation. Moreover, pseudo-Fermat number transform is used to enrich the supported operand size. A detailed complexity analysis and a systematic procedure for the selection of suitable parameter set for the FFTM3 are also provided to facilitate efficient implementation. In terms of the arithmetic level improvements for polynomial multiplication, negacyclic convolution method is used to avoid the zero padding during the transforms and the spectral domain computation. The parameter restriction is analyzed. A parameter set selection method which supports both an efficient modular reduction design and the security requirements for ring-Learning With Errors (ring-LWE) encryption and "Somewhat Homomorphic Encryption (SHE) are provided. Considering the hardware architecture level improvement, the architecture for FFTM3, polynomial multiplier, and ring-LWE crypsosystem are designed and implemented. Specifically, the architecture and implementation for the improved FFTM3 are introduced in details. The constant geometry FFT datapath is selected to minimize the control complexity. By analyzing the input/output schedule of FFT/IFFT, we manage the input/output of the RAMs to enable a pipelined bubble free architecture. Implementation results show that it offers better computation latency and area-latency product compared to the state of the arts when the operand size reached 3072-bit and above. In terms of polynomial multiplication, a versatile pipelined architecture accompanied with an improved dataflow is proposed to obtain a high-speed polynomial multiplier. The proposed architecture supports polynomial multiplications for different lengths and moduli, which could facilitate users' efficient design of cryptosystems like ring-LWE and SHE. Lastly, a dedicated hardware architecture for ring-LWE encryption cryptosystem is designed. Improved from the previous ring-LWE encryption scheme, we proposed an efficient scheme by moving most of the computations into spectral domain. This scheme reduces the number of FFT/IFFT from 9 to 7. An improved dataflow for encryption is also designed by exploiting the parallelism. The implementation results show that our design have more than 2 times and 3 times speedup for encryption and decryption, respectively, when compared with the state of the arts. In summary, arithmetic level and hardware architecture level improvements are proposed for the spectral arithmetic in modular arithmetic. It is shown that spectral arithmetic is important for the efficient computation of the large operand size modular arithmetic. The implementation results confirm the superiority of the proposed improvements on the spectral arithmetic.

    Research areas

  • Cryptography, Mathematics, Public key cryptography, Modular arithmetic