Reliability and Performance Optimization on NAND Flash Memory
閃存可靠性與性能優化
Student thesis: Doctoral Thesis
Author(s)
Related Research Unit(s)
Detail(s)
Awarding Institution | |
---|---|
Supervisors/Advisors |
|
Award date | 8 Feb 2021 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/theses/theses(f043dfb3-1028-4b02-9123-7ef368dd9dc3).html |
---|---|
Other link(s) | Links |
Abstract
Over the past decade, NAND flash memory has gained great attention in modern storage systems, due to its many advantages, such as high internal parallelism, shock resistance, and high access performance. Because of that, NAND flash memory has been widely deployed in embedded systems, mobile devices, personal computers, and servers. Currently, high-density flash-memory chips are under tremendous demands with the exponential growth of data. The multi-bit cell and small technology size have lowered the cost for NAND flash memory. In addition, 3D NAND flash memory has been introduced to further increase the storage density by stacking cells in layers. The higher density, however, comes at the cost of degraded reliability. To guarantee the reliability of stored data, error correction codes (ECC) with strong error correction capability are adopted. However, it always takes long read latency to deal with high RBER on NAND flash memory, which degrades the access performance. The main challenge of current high-density NAND flash memory is to address the reliability and performance issues in the presence of high raw bit error rates (RBER) on the stored data. In this thesis, we propose approaches to improve flash reliability and performance with the following four major contributions.
First, we performed a comprehensive characterization and analysis of high-density 3D NAND flash memory errors. Through the study, we identify that the errors on high-density 3D NAND flash are complex to estimate due to its unique structure and cell design. As a preferred read voltage can counteract some voltage-shift induced errors, we propose to reserve a small part of cells on each wordline to infer the optimal read voltages. An online calibration procedure is further presented to resolve the problem of possible non-uniform error distribution on some wordlines. With optimal voltages being inferred, the number of read retries will be significantly reduced.
Second, we propose a process variation based flash read performance improvement approach. The key insight is that data stored on flash blocks with different reliability levels will be accessed with varied latency. We show that existing methods that consider the write hotness of data tend to assign read-hot data to blocks with low reliability, thus degrading read performance. Motivated by this phenomenon, our work manages data with considerations of both read hotness and write hotness on NAND flash memory. To improve the read performance, read-hot data are allocated to flash blocks with high reliability. As most read-hot data will not be frequently updated, providing a small amount of high reliable blocks for read-hot data can introduce significant read performance improvement. Thus, most high reliable blocks are still allocated for write-hot data, which provides a lifetime guarantee for NAND flash memory.
Third, we propose a selective data compression scheme inside the flash controller. Most files, except multimedia files, stored on the storage systems today are not compressed even though they are highly compressible. After lossless compression with the granularity of a flash page, the data to be protected by ECC is shortened. With the same ECC, the error rate of data is reduced as well as the read latency for data decoding. As compressing data before programming to flash pages takes extra latency that impacts the write performance, we propose to selectively compress data by studying the access characteristics.
Forth, we performed the study on a large amount of accessed data in the flash storage of mobile devices and personal computers and found some characteristics. For example, when watching an online video, the video file will be cached. When the cache is full, video data will be written to the flash storage. The cached video data can endure a high RBER due to 2 reasons: 1) most of these data are encoded multimedia data, which presents high error tolerance; 2) even if they are corrupted, the system can download them from the internet. We define this type of data as approximate data and other data as regular data that are guaranteed to be error free. Then, based on the observation, we propose a scheme to leverage the error tolerance characteristic of approximate data for read performance improvement. The basic idea of the scheme is to organize both approximate data and regular data on one data page. The error correction capability of the page will all be used to correct the errors in regular data, while approximate data will be stored without ECC protection.
Overall, this work deepens the understanding of the error characteristics of NAND flash memory, especially for the recent high-density 3D NAND, and further proposes several approaches to exploit the characteristics of NAND flash memory and workload characteristics of real devices and systems for reliability and performance improvement.
First, we performed a comprehensive characterization and analysis of high-density 3D NAND flash memory errors. Through the study, we identify that the errors on high-density 3D NAND flash are complex to estimate due to its unique structure and cell design. As a preferred read voltage can counteract some voltage-shift induced errors, we propose to reserve a small part of cells on each wordline to infer the optimal read voltages. An online calibration procedure is further presented to resolve the problem of possible non-uniform error distribution on some wordlines. With optimal voltages being inferred, the number of read retries will be significantly reduced.
Second, we propose a process variation based flash read performance improvement approach. The key insight is that data stored on flash blocks with different reliability levels will be accessed with varied latency. We show that existing methods that consider the write hotness of data tend to assign read-hot data to blocks with low reliability, thus degrading read performance. Motivated by this phenomenon, our work manages data with considerations of both read hotness and write hotness on NAND flash memory. To improve the read performance, read-hot data are allocated to flash blocks with high reliability. As most read-hot data will not be frequently updated, providing a small amount of high reliable blocks for read-hot data can introduce significant read performance improvement. Thus, most high reliable blocks are still allocated for write-hot data, which provides a lifetime guarantee for NAND flash memory.
Third, we propose a selective data compression scheme inside the flash controller. Most files, except multimedia files, stored on the storage systems today are not compressed even though they are highly compressible. After lossless compression with the granularity of a flash page, the data to be protected by ECC is shortened. With the same ECC, the error rate of data is reduced as well as the read latency for data decoding. As compressing data before programming to flash pages takes extra latency that impacts the write performance, we propose to selectively compress data by studying the access characteristics.
Forth, we performed the study on a large amount of accessed data in the flash storage of mobile devices and personal computers and found some characteristics. For example, when watching an online video, the video file will be cached. When the cache is full, video data will be written to the flash storage. The cached video data can endure a high RBER due to 2 reasons: 1) most of these data are encoded multimedia data, which presents high error tolerance; 2) even if they are corrupted, the system can download them from the internet. We define this type of data as approximate data and other data as regular data that are guaranteed to be error free. Then, based on the observation, we propose a scheme to leverage the error tolerance characteristic of approximate data for read performance improvement. The basic idea of the scheme is to organize both approximate data and regular data on one data page. The error correction capability of the page will all be used to correct the errors in regular data, while approximate data will be stored without ECC protection.
Overall, this work deepens the understanding of the error characteristics of NAND flash memory, especially for the recent high-density 3D NAND, and further proposes several approaches to exploit the characteristics of NAND flash memory and workload characteristics of real devices and systems for reliability and performance improvement.