Cutting the Tail Latency for Low-Cost High-Density Server SSDs with Reliability Considerations

Project: Research

View graph of relations


While server SSDs are now considering using Triple Level Cell (TLC), low-cost highdensity Quadruple Level Cell (QLC) soon becomes a candidate for server SSDs. The benefit for moving toward a higher density is lower cost for server SSDs. The challenges in adopting low-cost high-density flash chips are characterized by its low reliability and huge latency variations. First, the low reliability will introduce significant tail latency, which is critical to servers. Second, the access latency can be varied up to 30X based on our preliminary studies. Even though there exist some works proposed in optimizing SSDs for performance, little work has been done with considerations of server SSD’ reliability-induced tail latency. With our preliminary studies of the error models of TLC and QLC and the specific needs for server SSDs, optimizing tail latency of future server SSDs relies on good latency control with reliability enhancement.This project aims to optimize tail latency for low-cost high-density server SSDs by developing a set of optimization schemes with considerations of reliability. Due to the significant reliability issue of low-cost high-density server SSDs, it is prone to significant tail latency. In this proposal, we take QLC as an example for study, and the results will be applied for future low-cost high-density flash chips. Furthermore, we must emphasize that tail latency might also come from large latency difference between reads and writes, so called read and write conflicts, and execution interference among applications, so called application conflicts. This project proposes a set of tail latency optimization schemes for QLC based server SSDs by exploiting the characteristics of QLC: First, a set of reliability optimization or awareness schemes will be developed, including a refresh scheme for tail latency optimization, and a data placement scheme for performance optimization. Second, a set of conflict minimization schemes will be developed, including a read and write conflict minimization scheme, and an application isolation scheme for conflict optimization. Finally, a design with hybrid SSD will be further proposed to enable high density NAND flash with high performance and improved tail latency. The proposed schemes will be verified on an industry testing platform.The success of this project will enable low-cost high-density server SSDs with (reliability-induced or unreliability-induced) tail latency and reliability optimization. Those server SSDs could help to satisfy the needs of big data and machine learning applications. This project will provide a tremendous drive for the innovation industry of Hong Kong.  


Project number9043007
Grant typeGRF
Effective start/end date1/01/2114/03/24