Spectral Arithmetic: An Efficient Approach for Modular Arithmetic and Architecture in Public-key Cryptography

公鑰密碼學中基於頻譜算術的高效模運算及其硬件架構

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date23 Apr 2018

Abstract

The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA encryption, ElGamal encryption, and Diffie-Hellman key exchange. Implementations reveal that more than 75% of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. Needless to say, the efficiency of modular multiplication has a direct impact on the performance of corresponding cryptosystems. Regular methods, e.g., the Toom-Cook method, require sub-quadratic time o(l2) to perform the modular product of two l-bit integers. On the other hand, spectral arithmetic has the advantage of invoking the fast Fourier transform method to reduce the complexity to quasilinear O(logl). This superiority becomes more significant when l increases.

Though spectral methods outperform the regular ones for large operand sizes, there still exist drawbacks which limit its performance. The first drawback refers to the zero-padding issue in the fast Fourier transform (FFT). It not only increases the computational complexity by doubling the transform length but also introduces redundant operations during hardware realization. The second drawback refers to the conditional selections in the modular multiplication algorithm, where serial computing and complicated control logic are required due to the long carry propagation and data dependency. This may reduce the efficiency of the algorithm. Meanwhile, involving conditional selections may be vulnerable against timing attacks in side-channel security. The main objective of this thesis is to explore possible solutions to these two issues and provide arithmetic and hardware architecture level improvements for high-performance computation and area-time efficient hardware design.

First, we focus on solving the zero-padding issue of the FFT-based Montgomery modular multiplication (MMM). We integrate the fast Fourier transform (FFT) method into the McLaughlin’s framework and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. The proposed algorithm is named as FFT-based McLaughlin’s Montgomery modular multiplication (FMLM3). Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. For every modular multiplication performing FMLM3, 7 FFTs are involved. A modified version of FMLM3 can further reduce this number to 5. Carry-save technique and parameter selection method are exploited to further accelerate the computation of FMLM3.

For hardware architecture design of FMLM3, architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. In the architecture, dataflow of constant geometry FFT is employed to simplify the right-and-write control. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9% for 1,024-bit, 41.9% for 2,048-bit, 37.8% for 4,096-bit and 103.2% for 7,680-bit operands. Furthermore, the operating latency has also outperformed at high clock frequency for length-64 transform and above.

Next, we address the conditional selection issue. Due to the conditional selections in McLaughlin’s multiplication (MLM) algorithm, extra operations may take place and its running time varies. This is considered to be inefficient and vulnerable to timing attacks. Due to this fact, we improve the original MLM algorithm and present a modified MLM algorithm which involves no conditional selection. The proposed algorithm is named as McLaughlin’s multiplication without conditional selections (MLWS). Compared to the original MLM algorithm, we inhibit the extra operations caused by the conditional selections and ensure constant running time for modular multiplications with different inputs. Thus, we improve both area-time efficiency and security against timing attacks. Then, we apply the FFT method to the modular multiplication steps of MLWS and derive the FFT-based MLWS (FMLM) algorithm. Compared to FMLM3, FMLM involves fewer long additions and subtractions and runs at a constant time. Combining the right-to-left binary method with FMLM, we derive an FFT-based modular exponentiation (FMLE) algorithm. Due to the constant running time of FMLM, FMLE is considered to be more secure against timing attacks and suitable for RSA computation.

Modular exponentiation architectures with dual pipelined FFT-based multipliers are designed obtaining area-latency efficient solutions. The results show that our work offers a better performance compared to the state-of-the-art works from and above 2,048-bit operand sizes. For single FFT-based modular multiplication, the architecture of FMLM provides constant running time for modular multiplications with different inputs. Besides, an improved area-latency efficiency is achieved with up to 24.3% for 1,024-bit, 15.6% for 3,072-bit and 35.5% for 4,096-bit operands when compared with the proposed FMLMarchitecture.

In summary, we explore the application of spectral arithmetic for Montgomery modular multiplication under McLaughlin’s new framework in terms of both arithmetic and hardware architecture perspectives. For arithmetic improvements, we first focus on solving the zero-padding issue of the FFT-based MMM. Then, we address the conditional selection issue of MLM in order to make it more suitable for RSA computation. For hardware architectures, we provide a design methodology of the spectral modular multiplication and exponentiation with cost-effective implementation results.