5.2 Overall Architecture CDB Pro
The above analysis leads us to rethink how to reduce the impact of eviction operations. At first, we should supply some space of the write cache to allow more data to be written. Obviously, equipping larger XL flash is a high cost, which is unacceptable to users. Thanks to the pSLC technique, a part of the QLC flash can be transferred to the pSLC flash at a low cost and its performance is high enough to be used as a write cache. By constructing a large write cache by XL and pSLC, the negative impact of backing up critical data in XL is reduced. However, when eviction operations occur in XL, the critical data stored in the victim block still prevent recycling XL space. Therefore, we should arrange the data stored in XL to improve eviction efficiency.
Keeping the above observations in mind, we propose CDB Pro to mitigate the effects of eviction operations. As shown in Figure
3, CDB Pro adds two modules—pSLC region management and critical data-guided greedy eviction. The function of pSLC region management is to: (i) dynamically adjust the capacity of the pSLC region, and (ii) schedule each write request to either XL or pSLC. Critical data-guided greedy eviction organizes the data according to their criticality.
5.3 PSLC Region Management
In this section, we introduce how pSLC region management works. Figure
4 illustrates an overall architecture of pSLC region management that is composed of four modules —critical data detector (Cd-Detector), free space monitor (Fs-Monitor), requests scheduler (Rq-Scheduler), and pSLC-Regulator. Cd-Detector not only detects which request is a critical request but also records the number of critical data stored in XL. Fs-Monitor is used to collect information about the amount of free space in XL. Based on the collected information of these two modules, Rq-Scheduler and pSLC-regulator can schedule write requests to either XL or pSLC and dynamically adjust the size of the pSLC, respectively.
Cd-Detector and Fs-Monitor: As described in Section
4, the critical requests will be sent to the devices with a critical tag. Based on this tag, the Cd-Detector can recognize the critical data. Then, it should identify whether the request is an update request or not to calculate the number of critical data in XL. It is easy to identify by loading the corresponding mapping entry. If the corresponding mapping entry is valid, the request is an update request. It means that this critical data has been stored in XL. The recording number of critical data in XL will remain the same. Otherwise, the recording number should be added with the request size.
Besides this, the free space information is collected by Fs-Monitor. Since this information has been recorded in the current device controller that is used to trigger GC, Fs-Monitor can get this information directly.
PSLC-Regulator: To make use of the pSLC, the first challenge is how to adjust the pSLC region size. There are two concerns. If the pSLC region is too large, the capacity of the QLC will be reduced. It may cause heavy GC in QLC and lower its performance and endurance. This also may result in insufficient device capacity to store user data. If the pSLC region is too small, the problem discussed above will not be solved.
To dynamically adjust the pSLC region size, the pSLC-Regulator makes use of the information recorded by the Cd-Detector. When a pSLC region is required, the pSLC-Regulator will set the pSLC region size as the number of critical data stored in XL and record it. This is the minimum size to help the device to recover its previous performance and lifetime. When the difference between the size of critical data stored in XL and the pSLC region size exceeds the adjustment unit, the pSLC region size will be changed by the adjustment unit. Each adjustment unit is
\(Size_{pSLC\_block} * N_{QLC\_plane}\), that is, one QLC block of all QLC planes is transferred to one pSLC block or vice versa. Then pSLC-Regulator updates its record. Since pSLC will sacrifice part of QLC capacity, the maximal pSLC region size follows the adjustment of real products [
11,
16]. For example, 70 GB is the maximal pSLC region size for 512 GB QLC flash when the utilization of QLC flash (
\(\frac{valid\ data}{QLC\ capacity}\)) is lower than 25%. Then, the maximal pSLC region size decreases linearly until the utilization is 85%. It will be 6 GB when the utilization is larger than 85%.
By providing a pSLC region that matches the size of critical data in XL, the write cache capacity for normal data writing is restored. This brings back the frequency of eviction operations to the original level. For one thing, a large write cache allows more data written before processing eviction operations. For another thing, the stored data can be fully updated by more write requests and generate more invalid data to improve eviction efficiency. Furthermore, since the plane is the smallest parallel unit of flash, the setting of the adjustment unit guarantees the maximal number of parallel units in the pSLC region.
Rq-Scheduler: As shown in Figure
4, XL and pSLC are located in different channels, which leads to the data written and GC of them being self-governed. Therefore, another challenge of constructing a large write cache via pSLC and XL is how to decide the destination of write requests. In order to make full use of the characteristics of XL and pSLC, Rq-Scheduler divides the state of the device into three stages. Based on the information recorded by Cd-Detector and Fs-Monitor, the size of critical data stored in XL (
\(S_{critical}\)) and the size of XL free space (
\(S_{xl\_free}\)) is compared with two thresholds,
\(T_{critical}\ and\ T_{xl\_free}\), respectively. First, if
\(S_{xl\_free}\) is larger than
\(T_{xl\_free}\) or
\(S_{critical}\) is smaller than
\(T_{critical}\), the device is on Stage I. In this stage, either XL has enough space for subsequent write requests or the critical data stored in XL have a slight influence on eviction operations. Therefore, all data are written to XL to make full use of its high performance, endurance, and reliability during this stage. Second, if both
\(S_{xl\_free}\) is smaller than
\(T_{xl\_free}\) and
\(S_{critical}\) is larger than
\(T_{critical}\), the device is on Stage II. That is, critical data occupies too much XL space, which influences eviction operations significantly. In this case, pSLC should play its role. The size of pSLC is set as the size of critical data in XL as described in pSLC-Regulator. Then, the critical data will still be written to XL to guarantee their reliability, while the non-critical data will be scheduled to the pSLC region to avoid frequent eviction. Third, if
\(S_{pslc\_free}\) is lower than the predefined threshold to trigger GC, the device is on Stage III. In this case, the pSLC region is insufficient to survey subsequent write requests. If the non-critical data are still written to the pSLC region without any limitation, frequent GC may occur in it. To avoid this problem, the GC efficiency of XL and pSLC are compared. Specifically, the GC efficiency of two regions is represented by the maximal number of invalid pages of all blocks. If the GC efficiency of XL is higher, the subsequent uncritical data are rescheduled to XL and vice versa. This leads to the space of the write cache being recycled with high efficiency and performance and endurance are improved.
There are two thresholds, \(T_{xl\_free}\) and \(T_{critical}\), in the Rq-scheduler. For \(T_{xl_free}\), we set it to the eviction threshold. That means XL has insufficient space when XL requires the eviction process to recycle space. For \(T_{critical}\), we set it based on the proportion of critical data in all data, which is 40% capacity of XL in this article. In a real deployment, manufacturers can set it based on the analysis of users’ data.
5.4 Critical Data-Guided Greedy Eviction
Normally, the eviction efficiency of XL is highly correlated with the data access characteristics, e.g., hotness. With the increase in the number of critical data in XL, critical data plays an important role in eviction efficiency.
Data Classification: Specifically, there are three types of data in XL with different characteristics: (1) the critical data that should be reserved in XL, such as the file system metadata and storage metadata; (2) the critical data that should be evicted to the QLC, which are the duplicates of the critical data, such as the duplicates of the file system metadata; and (3) the non-critical data, such as temporary files that have little impact on the system reliability. For the critical data that should be reserved in XL, when their located block should be evicted, they should be rewritten to XL. Therefore, the fewer such data in the victim block, the better. The critical data that should be evicted to QLC are generally migrated by the migration process introduced in Section
4.2 in a short amount of time. Therefore, when the eviction operation is processed, this kind of data is invalid. This leads to the victim block having a lot of this kind of data and being erased quickly to recycle space. Non-critical data has different characteristics. This causes them to have different lifetimes between critical data that should be kept and critical data that should be evicted. Obviously, these three kinds of data with different lifetimes should be stored in different blocks to improve eviction efficiency. Specifically, three write heads of each plane are introduced. They point to three different blocks of each plane to store three kinds of data, respectively.
Critical Data-guided Greedy Eviction Process: Figure
5 illustrates an example of how critical data-guided greedy eviction groups different kinds of data. Traditionally, different kinds of data are interleaved together, as shown in blocks 0 to 2. For example, if block 0 is selected as the victim block, half pages that are the reserved critical data will be rewritten to another XL block. At the same time, a quarter of pages that are non-critical data will be evicted to a QLC block and the rest of the pages are invalid data that can be directly erased. Therefore, erasing block 0 will free half the capacity of a block. The eviction efficiency is low.
By adopting critical data-guided greedy eviction, different kinds of data will be stored in different blocks. For example, blocks 0’ to 2’ are used to store the reserved critical data, the evicted critical data, and the non-critical data, respectively. When an eviction operation is required to process, block 2’ will be selected first since the number of invalid pages in it is the largest. Since all data has been migrated to QLC, block 2’ can be directly erased. Additionally, if some pages are still valid in block 2’, critical data migration of CDB is triggered. In this case, the eviction process is still fast. This is because, based on the migration trigger condition in Section
4.3, the number of these migrated data is smaller than
\(T_M\). The maximal cost is several stripe writes of the QLC, which is acceptable. If another eviction operation is required, block 1’ will be selected. The non-critical data in it will be evicted to the QLC. Then, the capacity of an XL block can be recycled. In this case, the migration process of CDB will not be activated since it will delay the execution of the eviction process and has no improvement on eviction efficiency. It is worth mentioning that block 0’ will not be selected to erase until it has the most invalid pages among all blocks. This is because the data stored in it is reserved critical data that cannot be evicted. Therefore, no space will be recycled by erasing block 0’ with zero invalid data. It should be noticed that the eviction granularity of critical data-guided greedy eviction also is the stripe size of the QLC. In this case, all parallel units of the slow memory are utilized to improve performance and since it is aligned with the access units of the slow memory, the write amplification is avoided. If the number of valid data in the victim block is less than the stripe size of the slow memory, they will be stored in storage DRAM temporarily and another victim block will be selected. Part of the valid data in the second victim block will be combined with the valid data in the first victim block to construct a stripe write to the slow memory. Then, the first victim block will be erased to recycle space.
Influence on GC when the device utilization is low: The above process is triggered when the device utilization is high. When the device utilization is low, the valid data in the victim block will be moved to another block in the same area. In contrast to the critical data-guided greedy eviction process, the number of invalid data is the most important metric for selecting the victim block like the traditional greedy GC [
3,
44]. Therefore, the block with the most invalid data will be selected no matter what kind of data is stored in it. Due to data classification, data with similar lifetimes will be stored in the same block, like multi-stream SSD methods [
2,
21], so that GC efficiency can also be improved when device utilization is low. However, due to the fact that there has been a lot of invalid data when the device utilization is low, the improvement is slight.