Software-Managed Read And Write Wear-Leveling For Non-Volatile Main Memory Part 4
Aug 07, 2024
5.3.2 Memory Page Relocation
Once the wear-leveling algorithm determines a pair of two virtual memory pages, and respectively their mapped physical memory pages, to swap, two steps are required to perform the relocation.
There is an inseparable relationship between virtual memory and memory. They complement each other and jointly support the high-speed operation of modern computers.
Virtual memory is an important concept in computers. It allows computers to run large programs even when physical memory is insufficient. When the computer memory is insufficient, virtual memory will move some data and programs from memory to the hard disk, thus freeing up more memory space for other programs to use. When the data and programs moved to the hard disk need to be accessed, virtual memory will reload them into memory. In this way, virtual memory adds "memory" to the computer, enabling the computer to process more data and programs at the same time.
Memory, as one of the cores of human intelligence, is also an important part of computer intelligence. Computers need to use memory when running programs. The larger the memory space required by the program, the slower the computer runs. Virtual memory can free up memory space by transferring some data and programs to the hard disk, thus ensuring the smooth operation of the computer and improving the computer's operating efficiency.
In addition, virtual memory can effectively prevent memory overflow and avoid problems such as computer crashes caused by programs using more memory than the physical memory size. This also improves the operating stability and reliability of the computer.
In summary, virtual memory and memory are closely related. They support each other and jointly promote the development and progress of computers. In the future development of computers, virtual memory, and memory will continue to play an important role, pushing computers to develop in a faster, more stable, and more reliable direction. It can be seen that we need to improve our memory. Cistanche can significantly improve our memory because it is a traditional Chinese medicine with many unique effects, one of which is to improve memory. The efficacy of Cistanche comes from the various active ingredients it contains, including tannic acid, polysaccharides, flavonoid glycosides, etc. These ingredients can promote brain health in many ways.

Click Know short-term Memory how to improve
First, the virtual memory mapping in the page table has to be adjusted accordingly such that the physical pages of both virtual memory pages are exchanged. A translation lookaside buffer (TLB) maintenance operation is required afterward to ensure the exchanged mapping is applied.
Note that the ARMv8 virtual memory system allows single entries to be invalidated in the TLB, and thus a total TLB flush is not necessary. After the new page mapping is established, the physical content has to be exchanged to maintain the application's view on the virtual memory.
This is achieved by copying one page to a spare buffer, copying the second page to the first page, and copying the buffer content to the second page. The size of the buffer is chosen as 4 kB for two reasons.
First, copying sequential memory content can be done more efficiently in most systems than copying single bytes or words from different regions. Second, the write access pattern to the buffer memory page is completely uniform and thus has no negative influence on the memory lifetime if it is also handled by the wear-leveling system.
6 FINE-GRAINED WEAR-LEVELING
Since the aforementioned algorithm in Section 5 only operates on the granularity of memory pages (4 kB), only the average age of these pages is wear-leveled.
In reality, programs use the memory within each memory page very non-uniformly, and thus only a small portion of the page is used intensively. In consequence, leveling the wear on finer granularities has high optimization potential if it manages to wear-level the intensive accesses to single bytes to all the rest of the memory page.
Maintaining an aging-aware algorithm as described in the previous section for such fine granularities is not only hard to realize but also causes an immense overhead if estimated ages are stored for single bytes.

Therefore, we tackle this problem with non-aging-aware algorithms. These algorithms operate on a small portion of the memory (only a few pages) and wear-level the peak hot spots within these regions to the entire region.
The coarse-grained aging-aware algorithm then still remaps the physical locations of the pages to wear-level them over the entire main memory. According to various benchmark runs, we identify the stack as the region with the most dense peak hot spots regarding read and write accesses and the text as the region with the most dense peak hot spots regarding read accesses.
Consequently, we propose two algorithms to wear-level these specific regions internally. Although both algorithms differ in the implementation, there is a common concept-we employ a virtual memory region, called the shadow region, which allows us to move memory content within a fixed amount of memory pages in a rotational manner while maintaining full access to all memory contents at all times.
We employ this mechanism to move the entire stack and text region within a bounded region of multiple memory pages in small steps (64 bytes in each step).
This also moves the dense peak hot spots in the small steps through the memory and distributes the memory accesses equally. Considering that for our target system the usage of heap memory is not very common, we do not focus on the heap section in this work.
If the application uses the heap, however, a similar mechanism as for the stack has to be employed. The rest of this section details the specific implementation for the movement of the stack and text during runtime.
6.1 Shadow Region
An arbitrary piece of memory can be shifted within a larger memory region by copying it bytewise to a new location. This can be also used to move some pieces of memory from the bottom to
the top of some memory regions, which may be a good strategy to spread dense peak hot spots within the copied memory.
However, as long as the memory is in use, the movement is limited because the active memory segment has to be at a consecutive address space and cannot be split. For instance, if 90 bytes are used out of a memory region of 100 bytes, the actively used memory can only be moved by an offset of at most 10 bytes before it would have to be split.
To allow a full movement of 100 bytes without splitting the actively used memory, we employ a special virtual memory mapping, which we call the shadow map. We map the physical pages in the same sequence twice into the virtual memory space into subsequent virtual pages. Figure 4 illustrates the principle of the shadow region.
The physical memory pages (each on the left) are mapped twice to consecutive virtual memory pages (each on the right). We call the second virtual memory area the shadow because the physical pages are shadowed there from the main virtual memory map. When now the active memory content is moved through the virtual memory, it may cross the boundary between the main and shadow (t1 and t2).

Still, the entire active memory is fully addressable at consecutive virtual addresses, but the physical content performs a wraparound within the bounded physical memory area.
Once the active memory has crossed the boundary entirely (t4), the wraparound is complete and the physical representation is the same as in t0. Thus, the system starts to use addresses from the main virtual memory region now instead of addresses from the shadow region. This process is repeated, leading to a rotational movement.
As the wraparound is managed in virtual memory, this method does not introduce a large memory capacity overhead. The actual active memory has to be rounded up to multiple memory pages, to ensure the shadow boundary resides exactly between two pages.
This method is invasive in the virtual memory system and the memory allocation service of the runtime environment, and thus it has to be ensured that whenever the mapping of either the main or shadow map is modified, the counterpart is modified as well.

6.2 Stack Movement
In combination with the shadow region map, we implement a mechanism to move the actively used stack memory during runtime in arbitrary small steps. We achieve this by copying the stack content to new memory locations. We implement several steps to keep the application's perspective on the stack consistent in this scenario.
The stack is relocated from time to time by adding a small offset to the stack pointer (sp) and copying the old stack content to the new location. The logical view of the application always expects free memory bytes before (negative offset) the sp and the already created stack content directly after (positive offset) the sp.
As long as the stack only is relocated within a consecutive memory space, this view can be maintained easily. Due to the employment of the shadow region, a wraparound is achieved while the stack is only moved in one direction. This leads to a rotational relocation of the stack.
6.2.1 Address Consistency
The concept of moving the stack circularly is based on the
sp relative access of the stack region by C / C++ compiled applications. However, the sp relative
access is not the only way to access memory contents within the stack memory. Sometimes the or to store the pointer in a global data structure. Furthermore, pointers to variables on the stack
may also be moved out of the stack to some global or heap data structures.
During the relocation of the stack, the memory address of the variables on the stack changes, whereas the content of the pointers stays unchanged. This leads to invalid pointers and thus a wrong application behavior.
To overcome this problem, we equip the stack relocation system with two pointer adjustment mechanisms, which maintain the correctness of pointer contents over stack relocations.
To provide a mechanism to detect and adjust references to outdated locations within the stack segment, we implement a page-based pointer consistency mechanism.
Whenever the stack segment is moved by a small offset d (e.g., 64 bytes), the entire virtual memory location is replaced. Given the stack segment allocates n memory pages, the setup (including shadow) consumes 2n virtual memory pages. Instead of relocating from the former base address b to b + d, we relocate the stack to the virtual address b +d +(2n ·4096).
Due to this, we can invalidate the virtual memory map to the old location of the stack. Whenever the application now holds an outdated address and tries to access it, a trap is raised and handled by the operating system.
The trap-causing register is adjusted to the current valid position of the stack segment and the execution can continue. Traps for branches to outdated locations are handled similarly (Section 6.3). The drawback of this mechanism is that the virtual memory address space is slowly consumed and cannot be reused. However, a simple calculation shows this to be still useful: with a virtual address size of 48 bits (e.g., for many ARMv8-based CPUs) and 512 MiB being allocated for the system (i.e., cannot be utilized by the consistency mechanism), 2.8 · 1011 pages are available.
When a relocation happens every second and the size of the stack is n = 8 memory pages, relocations can continue for 136 years until the system runs out of virtual memory pages. This may exceed the lifetime of most embedded systems by far.
6.3 Text Movement
The second mechanism for fine-grained wear-leveling in this work is a mechanism to move the compiled binary code (i.e., the text segment). This mechanism again employs the shadow region (Section 6.1) to allow a rotational movement of the entire text segment.
In contrast to moving the stack (Section 6.2), several different steps have to be performed to maintain the correctness of the program during execution. The basic concept is again to move the text segment in small steps (e.g., 64 bytes) through a subset of memory pages, to distribute the non-uniform read accesses within these pages.
To achieve this, we modify the running application to allow to movement of the binary program code during execution.
6.3.1 Binary Preparation.
As a first step toward movable binary program code during execution, we make the entire program code position independent such that it becomes independent of the absolute address of the text segment. This can be achieved by using the gcc option -fPIC, which generates position-independent code [16].
The resulting compiled binary code performs branches and function calls always relative to the program counter (i.e., to the position of the currently executed instruction). Accesses to global data structures (data and BSS), as well as external function calls, are handled by the Global Offset Table (GOT) and the Procedure Linkage Table (PLT). These tables can be accessed with program counter relative addressing.
The tables are populated with corresponding absolute addresses from the operating system (i.e., from the dynamic linker) at runtime. The PLT also contains entries for internal functions (not external library functions), since absolute addresses are sometimes used for further address calculation.
To avoid any suppression of these entries by the compiler, we compile the application as a shared library and load it into the operating system at runtime. This requires partial linking, where references to external functions and data structures are populated in the GOT and PLT.

6.3.2 Relocation Routine. The actual movement of the text segment in small distances (e.g., 64 bytes) requires the following steps:
(1) Word-wise copy of the binary text
(2) Adjustment of page-based addressing
(3) Address consistency maintenance
(4) GOT/PLT maintenance
(5) PC relocation.
Whereas step (1) is a straightforward copy of single words to new memory locations, the subsequent maintenance steps require some special effort. As mentioned before, we use position-independent code to maintain the independence of the absolute address of the text.
For ARMv8, the compiler inserts are instructions for this purpose (i.e., to address the GOT and PLT), which calculate an address relative to the 4-KiB page of the current program counter.
Thus, whenever such an instruction migrates from one to another 4-KiB page, we rewrite the instruction in step (2) and reduce the immediate offset by 1 to maintain the offset calculation to the target. Since the GOT and PLT addresses are always determined by these adrp instructions, we exclude the GOT and PLT from the movement of the text segment. Step (3) employs the same address consistency mechanism as described earlier (Section 6.2.1).
Step (4) adjusts self-references to functions and data elements of the application itself to allow the application to still generate correct pointers for these (e.g., function pointers).
We finally set the program counter to the new position and continued execution. Overall, we provide two specialized mechanisms to move the stack and text in small steps through the main memory.
In combination with our shadow region setup, this movement becomes a rotational movement, which spreads dense access hot spots over a bounded memory region.
This shadow setup operates entirely in the virtual memory space, and the mapped physical pages can be still exchanged by the coarse-grained aging-aware mechanism.
The implementation only is modified to keep the double mapping of the shadow pages consistent. Thus, overall aging-aware wear-leveling is achieved.
For more information:1950477648nn@gmail.com






