CCoW: Optimizing Copy-on-Write Considering The Spatial Locality in Workloads Part 2

Apr 02, 2024

The small page size can be problematic as the systems become capable of handling a huge amount of physical memory. With the hierarchical page table organization, each virtual address translation requires multiple memory accesses, one for each page table level, which is unacceptable. 

As people age, memory gradually declines. However physical memory is not one of the factors that directly affects our memory. Memory is related to the strength of the connections between neurons in the brain, to the health of the brain, and to how often the memory is used.

In our brains, neurons are connected by synapses. These synapses encourage us to store and retrieve information. However, as we age, these synaptic connections decrease, causing our memory to deteriorate. This is not because our physical memory is reduced, but because our brains are undergoing certain changes that affect our abilities.

However, using our brains can help us maintain good memories. The brain needs exercise, just like the body, it needs leisure and exercise to stay healthy. Developing good thinking habits will help your brain's health and efficiency. For example, learning new things, using memory techniques, playing logical puzzles, etc. can all improve the quality and efficiency of your brain.

Therefore, physical memory is simply the storage capacity available to the brain. Our memory is closely tied to the limits of what our brains can achieve. By constantly exercising our brains, we can make our memory more durable and maintain it at a high level. To improve memory, we should focus on exercising the brain while maintaining a positive attitude and good health habits. It can be seen that we need to improve memory, and Cistanche deserticola can significantly improve memory, because Cistanche deserticola has antioxidant, anti-inflammatory, and anti-aging effects, which can help reduce oxidation and inflammatory reactions in the brain, thereby protecting the health of the nervous system. In addition, Cistanche deserticola can also promote the growth and repair of nerve cells, thus enhancing the connectivity and function of neural networks. These effects can help improve memory, learning, and thinking speed, and may also prevent the development of cognitive dysfunction and neurodegenerative diseases.

increase brain power

Click know ways to improve brain function

To mitigate the high overhead of virtual to physical address translation, many modern architectures incorporate a cache for address translation. The MMU keeps several recent translation results in a hardware logic called a translation look-aside buffer, also known as TLB. 

Usually, the TLBs of modern architectures can hold around 500 to 2000 entries [6,7]. The entries are indexed by hardware so that the processor core can look up the translation very quickly. By leveraging the locality of memory references, many address translations can be performed without walking through the page table (referred to as TLB hit). 

As the memory footprint for memory-intensive applications grows rapidly, the number of virtual to physical page mappings for a process also increases. However, due to hardware limitations, the number of TLB entries cannot keep up with the rapid growth of application memory footprints. Thus, the TLB miss rates increase, causing bottlenecks in the performance of memory-intensive applications [8–11]. 

To overcome this limitation, some architectures support additional page sizes larger than the size of 4 KB base pages. For example, modern Intel architectures support 2 MB and 1 GB page sizes [7]. With such a huge page size, one address translation can cover a wider address range, effectively increasing the coverage the TLB can provide with the same number of entries. 

For instance, a system with 1024 TLB entries and a 4 KB base page size can provide TLB coverage of 4 MB, whereas the same number of entries with 1 GB huge pages provides 1 TB coverage. Linux utilizes the huge page in the form of transparent huge pages (THPs). As the name suggests, Linux implicitly provides user processes with huge pages whenever possible. 

If THP is not enabled, Linux allocates memory to processes in the 4 KB base page unit. If THP is enabled, Linux attempts to allocate a huge page (2 MB in size) instead of the base page, allowing a coarse-grained page mapping. This large granularity allows for efficient page sharing between parent and children processes through the fork. In case a huge page allocation is not feasible at the moment, Linux falls back to the base page allocation. Linux periodically scans process address spaces to find base pages and consolidate them into huge pages. 

improve your memory

There have been studies attempting to promote huge pages for performance while masking their shortcomings further. Ingens [12,13] proposes to prepare huge pages asynchronously off the critical path. 

Hawkeye [14] presents a fine-grained huge page promotion scheme based on memory access patterns to maximize performance with a minimal number of huge page promotions. Zhu et al. [15] generalize the processes of using huge pages and optimize the lifecycle of huge pages. Part et al. [16] allow holes in huge pages, providing flexibility in memory management with huge pages. 

The huge page, however, is a double-blade sword. Due to the increased management unit size, page allocation suffers from internal fragmentation. If an allocated address range is smaller than the huge page size, the rest of the page cannot be utilized and gets wasted. This so-called memory bloat can significantly decrease memory utilization on systems with huge pages [12–17]. 

The increased page size can negatively affect program performance as well. Modern OSs adopt the copy-on-write scheme extensively for efficient memory sharing between processes. The CoW is, however, processed only at the base page granularity. 

Thus, to handle CoW on a huge page, the huge page is split into base pages, and only the faulty page is copied. Breaking huge pages takes a considerable amount of time, resulting in intermittent long page fault handling. In this sense, some applications, even memory-intensive ones, do not recommend using huge pages for stable performance and memory utilization [4,18]. 

In general, there are ranges of address space in the process address space where all the pages in the range have the same permission and characteristics. For management, modern OSs usually adopt the concept of 'virtual memory area (VMA)' to represent such ranges of address space. We can classify the pages in the process address space according to their origin. 

improving brain function

Some pages can be loaded from a backing file on the secondary storage, referred to as 'file-backed pages'. Whereas, some pages are dynamically populated without any backing data. The pages for stack and heap are in this case, the so-called 'anonymous pages'.

2.2. Fork and Copy-on-Write

Fork is one of the POSIX standard system calls to create a new process. When a process invokes the fork system call, a new process is created as the child of the calling process. 

Under the hood, the OS creates the child process by duplicating the entire address space of the calling process. This implies that the child process should start with the same data as the parent process. 

To handle the address space duplication efficiently, most modern OSs use the copy-on-write (CoW) technique. To duplicate the address space of the parent, the OS does not copy each page. Instead, the page table of the child process is constructed by copying the page table of the parent process. 

This effectively makes a shared mapping to the address space of the parent. While making the shared mapping, the write permission for each page is dropped by clearing the permission bit in the corresponding PTE. 

After copying the mapping, both parent and child can read the shared pages as their pages. When one of the processes makes a write access to a page, the MMU, due to the lack of write permission, triggers a page fault. In the page fault handler, the OS allocates a new page, copies the original page, and updates the corresponding page mapping of the fault-causing process with written permission. 

At this point, the parent and child can have different data on the same virtual address. This copy-on-write mechanism is extensively used as the fundamental key mechanism for realizing many virtual memory features. Specifically, reads of non-initialized heap regions are usually handled with shared mapping to a zero page, which is a special page containing all zeros. 

Kernel same-page merging (KSM) is the technique of deduplicating the same pages in the system. The OS scans the pages in the system to identify pages with identical data. When such pages are found, the OS reclaims all but one page and updates the corresponding page tables to share the remaining page. 

In the processing, the write permission is dropped so that subsequent write access to the page is identified and copied. With the high efficiency of copy-on-write, process creation becomes efficient, and some data-intensive applications leverage this advantage to create a data copy. 

The Redis, one of the popular in-memory key-value store services [4], is one such case [19]. The Redis is designed to primarily keep the data in memory to provide high throughput and low latency. 

However, some applications demand the persistence of stored data, and Redis complements the in-memory design with a fork. The Redis applies inbound requests to the in-memory index and data structures only and periodically invokes the fork system call. This effectively creates a child process with duplicated memory contents of the original Redis process, and the calling process (i.e., the original process) continues processing inbound requests. 

The child process diverts its execution; using the current memory contents as a snapshot, it serializes in-memory data structures into files, thereby ensuring the persistence of the in-memory snapshot. After flushing the snapshot, the child process terminates. 

The original process can make another snapshot in the same way, and upon a system crash, Redis can be recovered by reading the last snapshot. Although the fork is an invaluable system call, its overhead has been criticized. 

supplements to boost memory

Baumann et al. [20] analyzed the fork and found that fork causes the performance degradation in modern applications. For example, as modern applications become more complex, the OS should consider approximately 25 special cases to start processing the fork system call to conform to the POSIX specification. They summarized the problems of the fork system call and suggested the features that the fork system call should have for the modern computer. 

They also provide alternative ways of replacing the fork. Zhao et al. [19] pointed out that the fork implementation in current systems is inefficient since applications with a large memory footprint require a long time to set up the page table. As a solution, they generalized the copy-on-write technique so that the page table is copied on writes as well as regular pages.


For more information:1950477648nn@gmail.com

You Might Also Like