Software-Managed Read And Write Wear-Leveling For Non-Volatile Main Memory Part 5

Aug 07, 2024

7 EVALUATION

In this section, we evaluate two main scenarios: (1) no-read-destructive NVMs and (2) read-destructive NVMs. For the former, only a subset of the presented concepts is used, and for the latter, all of the presented concepts are employed. 

Destructive reading does not affect our memory. In fact, unlike traditional note-taking or reading, destructive reading is a more active learning method that can improve our learning efficiency and memory.

Destructive reading refers to highlighting, highlighting, or adding a personal understanding of the key content when reading, to deepen our self-understanding and impression of knowledge. This learning method can help us absorb knowledge faster and more accurately, and greatly improve memory efficiency. With a deeper understanding and a stronger memory, we can apply the knowledge we have learned more flexibly in our future work, study, and life, and achieve better performance and greater success.

To achieve the greatest effect in destructive reading, we need to be fully prepared and focused. Choose a quiet and comfortable place and focus on reading and understanding so that we can understand and master as much as possible. If you are confused about a concept or idea, don't hesitate to get rid of your fear and seek a deeper understanding. We can communicate with classmates, teachers, or other experts to further understand these concepts and record our understanding.

In short, destructive reading is a positive learning method that can improve our learning efficiency and memory, allowing us to better apply knowledge in our future work and life. As long as we concentrate, keep a humble and studious attitude, and get used to recording our ideas, we can achieve better learning results. It can be seen that we need to improve memory. Cistanche can significantly improve memory because it has antioxidant, anti-inflammatory, and anti-aging effects, which can help reduce oxidation and inflammation in the brain, thereby protecting the health of the nervous system. In addition, Cistanche can also promote the growth and repair of nerve cells, thereby enhancing the connectivity and function of neural networks. These effects can help improve memory, learning ability, and thinking speed, and can also prevent the occurrence of cognitive dysfunction and neurodegenerative diseases.

ways to improve brain function

Click know supplements to boost memory

For each scenario, we evaluate coarse-grained, aging-aware wear-leveling first. As we already explained, this method cannot achieve optimal wear-leveling, and thus we evaluate it in combination with the fine-grained approaches afterward. 

First, we detail our evaluation setup and our analysis methodology, then we present the results for our two main scenarios.

7.1 Evaluation Setup

As the technical setup for the evaluation, we use the simulation environment [10], where we also implement our wear-leveling algorithms from Sections 5 and 6. 

Although the simulation setup executes a full system simulation and therefore our implementation would also run on a real system, using the simulation features the key advantage that we can easily trace memory accesses and analyze them afterward. 

In this work, we consider byte-addressable non-volatile main memories only (i.e., no block-based memories). Therefore, we only analyze the number of memory accesses per cell and not additional effects, such as block erase in flash-based memories. 

We record memory access traces for several benchmark applications for a baseline execution without any wear-leveling and for the various combinations of employed wear-leveling mechanisms. 

We always conduct a full system simulation with a working implementation of the wear-leveling algorithms in the runtime system. We then compare the total number of accesses for every memory byte and compute memory lifetime indicators. 

For the scenario of non-read-destructive NVMs, we only take write accesses into account, and for read-destructive NVMs, we consider write as well as read accesses. 

This also implies that the baseline (no wear-leveling) for both of these scenarios is different, and we therefore report improvements in the baseline. 

Because our implementation is a small bare metal kernel, porting and running applications from known benchmark suites requires manual code integration and the implementation of required system services. Thus, we limit the evaluation to a small set of benchmark applications.

7.2 Analysis Methodology

For each recorded access trace, we aggregate the total amount of filtered accesses to every memory byte. In addition to graphically illustrating the memory access counts over the memory space for our six benchmarks, we consider the analytical gain in memory lifetime. We compute several performance indicators:

• Achieved endurance: AE = mean_access_count max_access_count Assuming that the memory cannot be used any longer once the first memory cell is worn out,5 the maximum access count across all cells determines the maximum lifetime. 

Please note that this circumstance could be omitted by employing additional bad block management. As long as bad blocks are only detected on a more coarse granularity than virtual memory pages, the need for wear-leveling on the granularity of virtual memory pages and smaller granularities still is present. Under perfect conditions, memory accesses could be arbitrarily shuffled to other memory locations to make all cells entirely wear-leveled, which would lead to the mean access count being applied to every cell. 

improve cognitive function

Therefore, the quotient of both indicates the percentage of the possible ideal memory lifetime. We do not consider additional spare memory in this evaluation.

Endurance improvement: EI = AEanalyzed AEbaseline Given the achieved endurance from the baseline and another configuration, the quotient of both indicates the improvement in the achieved endurance compared to the baseline.

• Lifetime improvement: LI = EI OV+1 Given the endurance improvement and the overhead OV (percentage of additional memory accesses) of some trace, compared to its baseline, the gained memory lifetime can be calculated by relating both. 

For instance, if an algorithm improves the endurance by a factor of EI = 4 but causes OV = 100% overhead, meaning that due to wear-leveling the application requires the double amount of memory accesses to complete, the lifetime of the total system is increased by a factor of LI = 2.

For all benchmark runs, we calculate the AE, EI, and LI metrics.

7.3 Coarse-Grained Wear-Leveling

Our proposed implementation includes aging-aware coarse-grained wear-leveling, where the age of memory pages is estimated by sampling accesses during runtime. In this section, we only execute the age approximation and the memory page remapping, according to the remapping algorithm (Section 5.3). We record the resulting memory trace and illustrate the total number of accesses per byte graphically.

increase memory power

7.3.1 Non-Read-Destructive NVMs.

In the case of a non-read-destructive NVM, only write accesses are approximated and the age is only estimated by the number of write accesses per memory page. In Figure 5, we depict the total number of write accesses (y-axis) over the used memory space (x-axis) for our six benchmark applications, when the age approximation and page remapping algorithm are activated. 

We set the sampling rate of write accesses to the Cwrite sample = 2,000 and the notify threshold for the wear-leveling algorithm to nreloc = 64. The results show that the aging-aware algorithm works out because the write accesses are distributed in such a way that all regions of the memory are written with a similar pattern. 

However, write accesses are still not entirely wear-leveled, which can be deduced from the huge amount of peaks in the figure. It can also be seen that for the benchmarks with larger memory footprints (sha and Rijndael), the simulation time was not sufficient to target the entire memory space equally. 

If the application cannot run for a longer time, the wear-leveling configuration would have to be changed to achieve more frequent wear-leveling to overcome this shortcoming.

7.3.2 Read-Destructive NVMs.

When the target system is equipped with a read-destructive NVM, we enable the write and read approximation and estimate the memory page age based on their cumulative amount of read and write accesses, because both are assumed to cause the same wear-out. 

The wear-leveling algorithm remains unchanged; just the input (i.e., the estimated age) is different. We keep the configuration of the write approximation and the remapping threshold as in Section 7.3.1. We further configure the sampling rate of read accesses to Cread sample = 12,000, because read accesses happen at a much higher ratio than write accesses. 

Figure 6 illustrates the total amount of cumulative read and write accesses (y-axis) over the memory space (x-axis). An observation similar to that in Figure 5 can be made: the aging-aware wear-leveling works out, even concerning destructive read accesses. 

Still, it can be observed that coarse-grained wear-leveling is not sufficient to achieve an allover wear-leveled memory. The applications with larger memory footprints result in better wear-leveling than that presented in Section 7.3.1. Thus, because read and write accesses are encountered, more wear-leveling actions are performed.

7.4 Fine-Grained Wear-Leveling

As the evaluation in Section 7.3 points out, coarse-grained wear-leveling cannot achieve all over wear-leveled memory, since dense access hot spots within memory pages are not resolved. Consequently, this article proposes additional fine-grained wear-leveling, which is evaluated in this section. We execute the fine-grained stack and text wear-leveling in addition to the coarse-grained wear-leveling to achieve overall aging-aware wear-leveling.

7.4.1 Non-Read-Destructive NVMs

For non-read-destructive NVMs, the fine-grained extension only targets the stack, since the text region is only targeted by read accesses. We keep the same configuration as in Section 7.3.1 and perform a stack movement on every remapping of virtual memory pages (i.e., with the same ratio as the page remapping algorithm). We configure the relocation distance (i.e., the movement of the stack) to 64 bytes. 

Figure 7 presents the resulting amount of write accesses (y-axis) over the memory space (x-axis). It can be observed that for some benchmarks, a nearly entirely wear-leveled memory is achieved. 

The shortcoming in the Dijkstra benchmark stems from the fact that Dijkstra uses the data segment intensively to manage the algorithm steps. Therefore, dense write hot spots appear in the data segment, which cannot be resolved by our fine-grained stack mechanism.

7.4.2 Read-Destructive NVMs

To perform fine-grained wear-leveling on read-destructive NVMs, dense hot spots for reads as well as writes need to be tackled. Therefore, in addition to the aging-aware coarse-grained setup from Section 7.3.2, we employ our mechanism for stack and text wear-leveling. 

improve working memory

We keep the same configuration for the coarse-grained algorithm and execute a stack and a text relocation on every coarse-grained page relocation. In general, both ratios, however, can be separately configured to an arbitrary value. 

The relocation distance for both stack and text relocation is set to 64 bytes. The results in Figure 8 again allow similar observations as for the non-read-destructive case in Section 7.4.1. 

In general, memory is wear-leveled, considering the destructive influence of reading and writing. For the crc32 and Rijndael benchmarks, still, larger non-uniformity can be observed. This stems from the fact that the text wear-leveling only moves the relocatable code, but not the GOT and PLT. These two tables, however, are read during the benchmark execution and therefore cause a destructive influence on the underlying memory.

improve short term memory

7.5 Analytic Results

Since the figures presented previously only provide an intuition for the achieved quality of our proposed wear-leveling algorithm, we calculate the analytic lifetime indicators (Section 7.2) for all our benchmarks and summarize them in Table 1. Several observations can be made in this table. First, by only considering the last column (LI), it can be seen that the total memory lifetime is increased by our algorithm by up to a factor of 955. 

In other words, a memory lifetime of several days without any maintenance would be extended to many years by only employing our software-based algorithms. Second, read accesses can be slightly worse wear-leveled than write accesses in some benchmarks, which can be deduced from the lower lifetime improvement. As explained in Section 7.1, another baseline has to be considered for read-destructive NVMs. Thus, the improvement can be significantly lower than for non-read-destructive NVMs. 

Third, by investigating the first column (AE), it can be deduced how optimal the employed algorithms are. If AE would be 1, no further improvement would be possible. It can be observed that with coarse-grained wear-leveling only, in most cases only a few percent of the optimal endurance can be achieved. For fine-grained wear leveling, the algorithms perform significantly better but still allow the potential for further improvement. In addition, the achieved endurance differs for the different benchmark applications. 

The Rijndael benchmark achieves by far the worst results since our algorithms do not handle it properly. Although the overhead for the various wear-leveling configurations is implicitly included in the LI indicator, the overhead as the amount of additional memory accesses due to wear-leveling can be investigated itself. When considering performance-sensitive applications, the additional amount of memory accesses makes a major factor for the performance degradation. 

We calculate the overhead by comparing the total amount of memory accesses from a simulation with wear leveling to the baseline simulation without wear leveling. By considering only read accesses, this results in the read overhead (RO), for write accesses in the write overhead (WO), and for both access types in combination in the read-write overhead (RWO).

increase memory

Table 2 contains the resulting overheads for the various wear-leveling scenarios. It can be seen that for coarse-grained wear-leveling only, all overhead types reside at a few percent. When fine-grained wear-leveling is employed, it can be seen that the result of the wear-leveling depends on the analyzed application. For Rijndael, for instance, even allowing huge overheads for the wear leveling does not lead to significantly increased memory lifetimes. This can be explained by the fact that the wear-leveling does not target these types of memory accesses well. 

A few intensively accessed memory areas remain not wear-leveled. However, investigating the benchmarks where wear-leveling can achieve good memory lifetime improvements, the overhead makes up to ≈300%-that is, with wear-leveling, four times as many memory accesses are performed without wear-leveling. 

For the interpretation of this result, it should be considered that the overhead can be tuned by the configuration parameter on the cost of the wear-leveling result. However, if the application is not performance sensitive, such a big overhead may be still considerable; the memory lifetime is still increased by a factor of ≈200. 

The runtime overhead of our wear-leveling algorithms is an important indicator for practical usage. Not only do the additional memory accesses require more time for execution but also the execution of the access approximation and that of the wear-leveling decisions require additional computation time. To analyze this overhead, we compare the total required system cycles for a baseline configuration and configurations with enabled wear-leveling. The relative increase is reported in Table 3. 

It can be observed that the fine-grained wear-leveling methods in general cause a higher runtime overhead than the coarse-grained methods; the read wear-leveling requires more additional execution time than write wear-leveling. Furthermore, it can be seen that time overheads differ largely for different benchmark applications. 

For example, crc32 faces an overhead of at most 32%, whereas Rijndael faces an increase of the execution time by almost seven times. It should be noted that the time overhead also can be configured by tweaking the frequency of wear-leveling actions. If, however, a performance degradation in terms of execution time of up to almost two times is feasible, most benchmark applications can be wear-leveled using software-managed solutions.

8 CONCLUSION

In this work, we target computer systems that are equipped with NVM as the main memory. We distinguish the cases in this memory as either non-read-destructive or read-destructive. We propose software-managed wear-leveling to improve the lifetime of such systems since low cell endurance can cause a severely reduced lifetime. 

For the former type of system, we take write accesses into account to determine the current age of the memory and to perform according to wear leveling actions, and for the latter case, we take write and read accesses equally into account since both stress the memory equally. To perform aging-aware wear-leveling (i.e., the current memory age is investigated for each wear-leveling decision) during runtime, we propose a generic runtime approximation of write and read accesses that does not rely on special hardware or debugging capabilities. 

This approximation is subsequently fed into a wear-leveling algorithm that swaps memory pages according to their estimated age. Since many applications require additional wear-leveling on fine granularities, we further propose two fine-grained wear-leveling mechanisms, where we specifically target the stack and text region. 

These specific solutions also operate without any special hardware or any special system requirements, and thus they are software-managed. The specific solution for the text segment is only invoked for read-destructive NVMs since the text segment only is targeted by read accesses. Our evaluation compares the final memory lifetime after applying our algorithms with the memory lifetime of the baseline execution of certain benchmark applications. For non-read-destructive NVMs, we can extend the lifetime by up to a factor of 955×, and for read-destructive NVMs, we achieve an improvement of up to a factor of 418×. 

Although these numbers strongly depend on the memory behavior of the baseline execution of the specific application, we achieve ≈40% of ideal wear-leveling for non-read-destructive NVMs and ≈20% of ideal wear-leveling for read-destructive NVMs. The major shortcomings causing this are memory access patterns that are not explicitly tackled by our methods.

9 OUTLOOK

As our evaluation points out, we achieve a reasonable improvement in the memory lifetime by employing our algorithms accordingly. Still, we cannot achieve the ideal wear leveling (indicated by the achieved endurance (AE)). 

help with memory

In other words, our algorithms may be further improved to achieve better wear-leveling in all scenarios. As can be observed for the Dijkstra benchmark, the data and BSS sections need specific wear-leveling in some cases as well. In addition, the specific solution for the text segment does not resolve access hot spots in the GOT and PLT. We intend to improve upon these shortcomings in future work.


REFERENCES

[1] Hoda Aghaei Khouzani, Yuan Xue, Chengmo Yang, and Archana Pandurangi. 2014. Prolonging PCM lifetime through energy-efficient, segment-aware, and wear-resistant page allocation. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design (ISLPED'14). ACM, New York, NY, 327–330. https://doi.org/10.1145/ 2627369.2627667 

[2] Chi-Hao Chen, Pi-Cheng Hsiu, Tei-Wei Kuo, Chia-Lin Yang, and Cheng-Yuan Michael Wang. 2012. Age-based PCM wears leveling with nearly zero search cost. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM, New York, NY, 453–458. https://doi.org/10.1145/2228360.2228439 [3] Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy, and endurance. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). ACM, New York, NY, 347–357. https://doi.org/10.1145/1669112.1669157 

[4] Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li. 2011. Wear rate leveling: Lifetime enhancement of PRAM with endurance variation. In Proceedings of the 48th Design Automation Conference. ACM, New York, NY, 972–977. 

[5] Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'10). 914–919. http://dl.acm.org/citation.cfm?id=1870926.1871147. 

[6] Vaibhav Gogte, William Wang, Stephan Diestelhorst, Aasheesh Kolli, Peter M. Chen, Satish Narayanasamy, and Thomas F. Wenisch. 2019. Software wear management for persistent memories. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST'19). 45–63. https://www.usenix.org/conference/fast19/ presentation/gogte. 

[7] William Goh and Andreas Dannenberg. 2014. MSP430 FRAM Technology-How To and Best Practices. Technical Report SLAA628. Texas Instruments. https://www.ti.com/lit/an/slaa628a/slaa628a.pdf?ts=1609843980784&ref_url= https253A252F252Fwww.ti.com252Fproduct252FMSP430FR5989-EP. 

[8] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 2001 IEEE International Workshop on Workload Characteristics (WWC'01). IEEE, Los Alamitos, CA, 3–14. https://doi.org/10.1109/WWC.2001.15 

[9] Christian Hakert, Kuan-Hsun Chen, Paul R. Genssler, Georg Brüggen, Lars Bauer, Hussam Amrouch, Jian-Jia Chen, and Jörg Henkel. 2020. SoftWear: Software-only in-memory wear-leveling for non-volatile main memory. CoRR abs/2004.03244 (2020). https://arxiv.org/pdf/2004.03244.pdf. 

[10] Christian Hakert, Kuan-Hsun Chen, Mikail Yayla, Georg von der Brüggen, Sebastian Bloemeke, and Jian-Jia Chen. 2020. Software-based memory analysis environments for in-memory wear-leveling. In Proceedings of the 25th Asia and South Pacific Design Automation Conference (ASP-DAC'20). 

[11] Y. Han, J. Dong, K. Weng, Y. Wang, and X. Li. 2016. Enhanced wear-rate leveling for PRAM lifetime improvement considering process variation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 1 (Jan. 2016), 92– 102. https://doi.org/10.1109/TVLSI.2015.2395415 

[12] Kaixin Huang, Yijie Mei, and Linpeng Huang. 2020. Quail: Using NVM write monitor to enable transparent wear leveling. Journal of Systems Architecture 102 (2020), 101658.


For more information:1950477648nn@gmail.com

You Might Also Like