Linux Cache write back mechanism __linux

Source: Internet
Author: User
Linux Cache write back mechanism In -situ: Http://oenhan.com/linux-cache-writeback

In the process of security monitoring, the head decision, if the discovery of a process in the D state, that is, task_uninterruptible (uninterrupted sleep state), time over 8min, the system will be panic off. Just DB group do log, the entire log cache into memory, the final brush disk, the results of the system on the D-State for a long time, natural panic, the middle of a cache of Linux to write back to brush disk some of the mechanisms and tuning methods, write a summary.

The current mechanism needs to brush the dirty page back to disk is generally the following: Dirty page cache consumes too much memory, not enough memory, dirty page has been changed for a long time, time has come to a critical value, the need for timely refresh to maintain memory and disk data consistency; External commands force flush dirty pages to disk Check state refresh when write disk

The kernel uses Pdflush threads to flush dirty pages to disk, Pdflush threads between 2 and 8, can be viewed directly through/proc/sys/vm/nr_pdflush_threads files, and specific policy mechanisms refer to the source function __pdflush. One, the kernel other modules force Refresh

First and third situation: when the memory space is not enough or forced to refresh the outside, Dirty page refresh is achieved by calling the Wakeup_pdflush function, call its function of Do_sync, Free_more_memory, Try_to_free_ Pages The functions of the Wakeup_pdflush are implemented through Background_writeout functions:

static void Background_writeout (unsigned long _min_pages) {long min_pages = _min_pages; struct Writeback_control WBC = {. BDI = null,. sync_mode = Wb_sync_none,. older_than_this = null,. Nr_to_write = 0.
 nonblocking = 1,}; for (;;)
 {struct writeback_state WBS;
 Long Background_thresh;
 Long Dirty_thresh;
 Get_dirty_limits (&wbs, &background_thresh, &dirty_thresh, NULL);
 if (Wbs.nr_dirty + wbs.nr_unstable < background_thresh && min_pages <= 0) break;
 wbc.encountered_congestion = 0;
 Wbc.nr_to_write = max_writeback_pages;
 wbc.pages_skipped = 0;
 Writeback_inodes (&AMP;WBC);
 Min_pages-= Max_writeback_pages-wbc.nr_to_write; if (Wbc.nr_to_write > 0 | | wbc.pages_skipped > 0) {/* wrote less than expected/blk_congestion_wait (write, HZ/1
 0);
 if (!wbc.encountered_congestion) break; }
 }
}

Background_writeout into a dead loop, Get_dirty_limits gets the critical value Background_thresh of the dirty page starting to refresh, that is, the total memory page for dirty_background_ratio hundreds of cent ratio , can be adjusted through the proc interface/proc/sys/vm/dirty_background_ratio, the general default is 10. When the dirty page exceeds the critical value, call Writeback_inodes write Max_writeback_pages (1024) page until the dirty page ratio is below the critical value. Second, the kernel timer to start refreshing

The kernel initializes the Wb_timer timer in Page_writeback_init at boot time, the timeout is dirty_writeback_centisecs, the unit is 0.01 seconds, and can be passed through/proc/sys/vm/dirty_ Writeback_centisecs adjustment. The trigger function of Wb_timer is WB_TIMER_FN, which is finally realized through wb_kupdate.

static void Wb_kupdate (unsigned long arg) {sync_supers ();
 Get_writeback_state (&AMP;WBS);
 Oldest_jif = jiffies-(DIRTY_EXPIRE_CENTISECS * HZ)/100;
 Start_jif = jiffies;
 Next_jif = Start_jif + (DIRTY_WRITEBACK_CENTISECS * HZ)/100;
 Nr_to_write = Wbs.nr_dirty + wbs.nr_unstable + (inodes_stat.nr_inodes-inodes_stat.nr_unused);
 while (Nr_to_write > 0) {wbc.encountered_congestion = 0;
 Wbc.nr_to_write = max_writeback_pages;
 Writeback_inodes (&AMP;WBC);
 if (Wbc.nr_to_write > 0) {if (wbc.encountered_congestion) blk_congestion_wait (write, HZ/10); else break;
 /* All of the old data is written/} nr_to_write-= Max_writeback_pages-wbc.nr_to_write;
 } if (Time_before (next_jif, jiffies + Hz)) Next_jif = jiffies + Hz;
 if (dirty_writeback_centisecs) Mod_timer (&wb_timer, next_jif); }

The code above does not copy all. The kernel first flushes the Super block information to the file system, and then gets oldest_jif as the WBC parameter refreshes only dirty pages that have been modified longer than Dirty_expire_centisecs, dirty_expire_centisecs parameters can be passed through/proc /sys/vm/dirty_expire_centisecs adjustment. iii. Write Write file flush cache

When user state writes a file using the Write function, it is also possible to refresh the dirty page, and the Generic_file_buffered_write function refreshes the disk to balance the current dirty page ratio after the Write memory page is marked as dirty, depending on the condition, see Balance_dirty_pages_ ratelimited function:

void balance_dirty_pages_ratelimited (struct address_space *mapping)
{
 static define_per_cpu (int, ratelimits ) = 0;
 Long Ratelimit;
 Ratelimit = ratelimit_pages;
 if (dirty_exceeded)
 ratelimit = 8;
 * * Check the rate limiting. Also, we don't want to throttle real-time
 * tasks in Balance_dirty_pages (). Period.
 *
 /if (Get_cpu_var (ratelimits) + + >= ratelimit) {
 __get_cpu_var (ratelimits) = 0;
 Put_cpu_var (ratelimits);
 Balance_dirty_pages (mapping);
 return;
 }
 Put_cpu_var (ratelimits);
}

The balance_dirty_pages_ratelimited function adjusts the number of times to refresh (call the Balance_dirty_pages function) by ratelimit_pages, and every ratelimit_pages call is refreshed once. The specific refresh process looks at the Balance_dirty_pages function:

static void Balance_dirty_pages (struct address_space *mapping) {struct writeback_state WBS;
 Long nr_reclaimable;
 Long Background_thresh;
 Long Dirty_thresh;
 unsigned long pages_written = 0;
 unsigned long write_chunk = Sync_writeback_pages ();
 struct Backing_dev_info *bdi = mapping->backing_dev_info; for (;;) {struct Writeback_control WBC = {. BDI = BDI,. sync_mode = Wb_sync_none,. older_than_this = NULL,. Nr_to_write = WRI
 Te_chunk,};
 Get_dirty_limits (&wbs, &background_thresh, &dirty_thresh, mapping);
 nr_reclaimable = Wbs.nr_dirty + wbs.nr_unstable;
 if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh) break;
 if (!dirty_exceeded) dirty_exceeded = 1;
 /* Note:nr_reclaimable denotes Nr_dirty + nr_unstable. * Unstable writes are a feature of certain networked * filesystems (i.e. NFS) in which data could have been * written to T
 He server's write cache, but has not yet * been flushed to permanent storage. */if (nr_reclaimable) {Writeback_inodeS (&AMP;WBC);
 Get_dirty_limits (&wbs, &background_thresh, &dirty_thresh, mapping);
 nr_reclaimable = Wbs.nr_dirty + wbs.nr_unstable;
 if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh) break;
 Pages_written + = Write_chunk-wbc.nr_to_write; if (Pages_written >= write_chunk) break;
 * We ' ve done We duty/} blk_congestion_wait (WRITE, HZ/10);
 } if (nr_reclaimable + wbs.nr_writeback <= dirty_thresh && dirty_exceeded) dirty_exceeded = 0; if (writeback_in_progress (BDI)) return; /* Pdflush is already working this queue *//* In laptop mode, we are until hitting the higher threshold the before * St Arting background writeout, and then write out all of the way down * to the lower threshold.
 So slow writers cause minimal disk activity. * In normal mode, we start background writeout on the lower * Background_thresh, to keep the amount of dirty memory
 W. */if (laptop_mode && pages_written) | | (!laptop_mode && (nr_reclaimable> Background_thresh))) pdflush_operation (background_writeout, 0); }

The function goes into a dead loop, obtains the corresponding memory page value of Dirty_background_ratio and Dirty_ratio by Get_dirty_limits, when 24 rows are judged, if the dirty page is larger than the Dirty_thresh, Then call Writeback_inodes to start the brush cache to disk, if the dirty page rate does not brush under Dirty_ratio, then block write with blk_congestion_wait, and then loop repeatedly until the ratio is reduced to dirty_ratio When the ratio is below dirty_ratio, but the dirty page ratio is greater than dirty_background_ratio, enable Background_writeout,pdflush_ with Pdflush_operation Operation is a non-blocking function that wakes up pdflush and returns directly after Background_writeout is called in Pdflush.

So know: Write writing, when the cache exceeds Dirty_ratio, it will block writes, brush dirty pages until the cache is lower than dirty_ratio; if the cache is higher than background_writeout, it will wake Pdflush process to brush the dirty page when the write operation , the write operation is not blocked. Four. Summary of problems

Most of the process D states are due to the 3rd and 4th situations: There is a lot of write operations, caching is managed by the Linux system, and once the dirty pages are accumulated to a certain extent, whether it is continuing to write or Fsync refresh, the process D will live.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.