Brief introduction of hot and cold page mechanism in Linux _linux

Source: Internet
Author: User
Tags goto prev

what is a hot and cold page?

The concept of hot and cold pages is introduced in the buddy system of the Linux kernel physical memory management. A cold page indicates that the free page is no longer in the cache (generally refers to L2 cache), and the hot page indicates that the free page is still in the cache. Hot and cold pages are for each CPU, each zone, will be the needle for all the CPU initialization of a hot and cold page of the Per-cpu-pageset.

Why do you have a hot and cold page?

function has 3 points:

Buddy allocator If a hot page is allocated when allocating a free page with order 0, the page already exists in the L2 cache. CPU Write access, do not need to first read the contents of memory in the cache, and then write. If you assign a cold page, the page is not in the L2 cache. In general, using hot pages as much as possible is easy to understand. When should I use a cold page? While allocating a physical page frame, there are a bit specifying whether we would like a hot or a cold page (which is, p The age likely to is in the CPU cache, or a page not likely to is there). If the page is used by the CPU, a hot page would be faster. If the page is used for device DMA the CPU cache would is invalidated anyway, and a cold page does not waste precious Cache contents.

A simple translation: when the kernel allocates a physical page frame, there are some specifications to constrain whether we are allocating hot or cold pages. When the page box is CPU-used, the hot page is allocated. When the page box is used by the DMA device, the cold page is allocated. Because the DMA device does not use the CPU cache, it is not necessary to use hot pages.
When Buddy system assigns an idle page in a zone to a process, it first needs to lock the zone with a spin lock, and then allocate the page. This way, if a process on multiple CPUs is assigned to the page at the same time, it will compete. When the Per-cpu-set is introduced, the competition does not occur and the efficiency is increased when the processes on multiple CPUs allocate pages at the same time. In addition, when a single page is released, the free page is first put back into the per-cpu-pageset to reduce the use of the spin lock in the zone. When the number of pages in the page cache exceeds the threshold, then the page is placed back into the partner system.

One advantage of using a hot and cold page per CPU is that it ensures that a page is stuck on 1 CPUs, which helps to increase the cache's hit rate.

Data structure of hot and cold pages

 struct Per_cpu_pages {
  int count;    Number of pages in the list
  int high;    High watermark, emptying needed
  int batch;    Chunk size for Buddy Add/remove
   /Lists of pages, one per migrate type stored
   on the pcp-lists each CPU on each zone Migrate_pcptypes a hot and Cold page list (according to the type of migration)
   struct list_head lists[migrate_pcptypes];
  

In Linux, for Uma architecture, hot and cold pages are managed on a linked list. Hot page in front, cold page in the rear. When the CPU releases an order 0 page, if the number of pages in the Per-cpu-pageset is less than its specified threshold, the freed page is inserted at the beginning of the hot and Cold page list. In this way, the hot page that you inserted before will move backwards with the constant insertion of the hot page, and the chances of the page getting colder from hot are greatly increased.

How to assign a hot and cold page

When allocating order 0 pages (hot and cold page mechanism only to deal with a single page distribution), first find the appropriate zone, and then according to the required type of migratetype to locate the hot and cold page (each zone, for each CPU, there are 3 hot and cold page linked list, corresponding to: Migrate_ Unmovable, Migrate_reclaimable, migrate_movable). If a hot page is required, remove a page from the header (this page is the most "hot"), and if a cold page is required, remove the page from the end of the list (this page is "cold").

Assign function (key part added note):

 * * Really, prep_compound_page () should is called from __rmqueue_bulk (). But * We cheat by calling it from here, in the order > 0 path.
 Saves a branch * or two. */static inline struct page *buffered_rmqueue (struct zone *preferred_zone, struct zone *zone, int order, gfp_t Gfp_fla
 GS, int migratetype) {unsigned long flags;
 struct page *page; Allocation flag is __gfp_cold to allocate cold page int leng =!!
(Gfp_flags & __gfp_cold);
  Again:if (likely (order = 0)) {struct per_cpu_pages *pcp;
  struct List_head *list;
  Local_irq_save (flags);
  PCP = &this_cpu_ptr (zone->pageset)->pcp;
  List = &pcp->lists[migratetype];
   if (List_empty (list)) {//If the page is missing, it is allocated from buddy system.
   Pcp->count + = Rmqueue_bulk (zone, 0, Pcp->batch, list, migratetype, cold);
  if (Unlikely (List_empty (list)) Goto failed;
  } if (cold)///allocation of cool pages, from the tail of the chain distribution, list as a chain header, List->prev to indicate the chain footer page = list_entry (list->prev, struct page, LRU); else//When the hot page is allocated, assign page = List_entry from the list header(List->next, struct page, LRU);
  After assigning a page box, delete the page List_del (&PAGE->LRU) from the Hot and cold page list;
 pcp->count--;  else {//If Order!=0 (page box number >1), do not assign if (Unlikely (Gfp_flags & __gfp_nofail)) {/* __gfp_nofail) from the Hot and cold page list not
    To is used in new code.
    * All __gfp_nofail callers should is fixed so, they * properly detect and handle allocation. * We most definitely don ' t want callers attempting to * allocate-greater than order-1 page units with * __GF
    P_nofail.
  * * Warn_on_once (Order > 1);
  } spin_lock_irqsave (&zone->lock, flags);
  page = __rmqueue (zone, Order, migratetype);
  Spin_unlock (&zone->lock);
  if (!page) goto failed;
 __mod_zone_page_state (Zone, Nr_free_pages,-(1 << order));
 } __count_zone_vm_events (Pgalloc, zone, 1 << order);
 Zone_statistics (Preferred_zone, Zone, gfp_flags);
 Local_irq_restore (flags);
 vm_bug_on (Bad_range (Zone, page));
  if (Prep_new_page (page, order, gfp_flags))Goto again;
return page;
 Failed:local_irq_restore (flags);
return NULL; }

The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.