Linux記憶體的缺頁與置換
來源:互聯網
上載者:User
一)缺頁當CPU請求一個不在RAM中的記憶體頁時,會發生缺頁,比如我們從記憶體讀取/寫入資料,而資料未在記憶體,此時都會發生缺頁.我們通過下面的程式對記憶體缺頁情況進行測試,程式通過分配大塊記憶體以供程式使用,該程式只訪問一次記憶體就不再使用它,它的做法是通過malloc分配記憶體,並在每頁修改1個位元組,然後進入睡眠狀態.注:Linux非常靈敏,它不提供任何實體儲存體給未被修改過的頁,所以我們必須在一個已指派地區的每頁中讀出或寫入至少1個位元組,來消耗記憶體中的頁.測試程式hog.c如下:#include <stdio.h>#include <string.h>#include <stdlib.h>#include <unistd.h>intmain (int argc, char *argv[]){ if (argc != 2) exit (0); size_t mb = strtoul(argv[1],NULL,0); size_t nbytes = mb * 0x100000; char *ptr = (char *) malloc(nbytes); if (ptr == NULL){ perror("malloc"); exit (EXIT_FAILURE); } size_t i; const size_t stride = sysconf(_SC_PAGE_SIZE); for (i = 0;i < nbytes; i+= stride) { ptr[i] = 0; } printf("allocated %d mb\n", mb); pause(); return 0;}編譯gcc hog.c -o hog查看當前的記憶體free -m total used free shared buffers cachedMem: 503 206 296 0 30 140-/+ buffers/cache: 36 467Swap: 1027 0 1027我們通過使用GNU time命令來查看缺頁的次數\time ./hog 100allocated 100 mbCommand terminated by signal 20.00user 3.12system 0:04.52elapsed 69%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+25719minor)pagefaults 0swaps注:25719minor表示缺頁25719次,每次4KB,正好是100MB的記憶體配置.major表示主缺頁,主缺頁是要求輸入/輸出到磁碟的缺頁.minor表示次缺頁,次缺頁是任何其它的缺頁.
二)置換置換是指程式請求記憶體,而實體記憶體不足時,核心將被迫把缺頁儲存到置換分區,也就是說,它將最近最少使用的頁面(LRU)置換到SWAP.如果SWAP不足,或沒有SWAP,就會發生記憶體配置失敗.如下:swapoff -a./hog 500 &[1] 2901malloc: Cannot allocate memory[1]+ Exit 1 ./hog 500下面看看置換的發生,先查看當前的記憶體.swapon -afree -m total used free shared buffers cachedMem: 503 43 459 0 0 9-/+ buffers/cache: 33 469Swap: 1027 0 1027現在有469MB的空閑空間,此時我們通過hog程式,請求500MB的記憶體空間,將會發生置換,如下\time ./hog 500 allocated 500 mb(程式pause在這裡,要用CTRL+C中斷才能看到下面的資訊,在這之間我們可以在另一個終端查看記憶體空間)Command terminated by signal 20.02user 3.53system 0:07.19elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+128142minor)pagefaults 0swaps在另一個終端用free查看記憶體如下:free -m total used free shared buffers cachedMem: 503 497 6 0 0 3-/+ buffers/cache: 493 9Swap: 1027 53 974這裡我們看到產生了53MB的記憶體置換.回到第一個終端,我們看到缺頁的資訊是:0major+128142minor這0個主缺頁是指當進程請求一個駐留在磁碟上的頁時,才發生主缺頁,在這種情況下,頁面不存在,因此它們沒有駐留在磁碟上,所以不被計入主缺頁數.雖然hog進程引起系統向磁碟寫入頁,但實際上它沒有把那些頁寫入磁碟,實際寫入是由kswapd完成的.kswapd核心線程負責把資料從記憶體移到磁碟的煩瑣工作,只有當擁有那些頁的進程再次調用它們時才會產生一個主缺頁.
三)top命令top命令是周期性的更新.下面介紹幾個top命令的提示.1)切換顯示命令名稱和完整命令輸入c2)根據駐留記憶體大小進行排序輸入M3)根據CPU使用百分比大小進行排序輸入P4)根據時間/累計時間進行排序輸入T5)顯示線程輸入H6)將top的資訊劃分為4個螢幕,分別DEF,Job,Mem,Usr輸入Shift+A1:Def - 12:50:16 up 1:14, 2 users, load average: 0.00, 0.01, 0.00Tasks: 72 total, 2 running, 70 sleeping, 0 stopped, 0 zombieCpu(s): 0.0%us, 5.6%sy, 0.0%ni, 94.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%stMem: 515600k total, 72684k used, 442916k free, 3372k buffersSwap: 1052248k total, 23140k used, 1029108k free, 48128k cached1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 138 root 13 -5 0 0 0 S 0.0 0.0 0:04.23 [kswapd0] 2569 root 15 0 8216 664 436 R 5.6 0.1 0:02.69 sshd: root@pts/0,pts/1 2463 root 18 0 1924 160 140 S 0.0 0.0 0:02.32 hald-addon-storage: polling /dev/hdc 2405 root 15 0 25188 6180 1792 S 0.0 1.2 0:02.23 /usr/bin/python /usr/sbin/yum-updatesd 2439 haldaemo 18 0 5348 640 392 S 0.0 0.1 0:01.81 hald 1 root 15 0 2032 160 140 S 0.0 0.0 0:01.18 init [3] 3234 root 15 0 2164 1000 796 R 0.0 0.2 0:00.95 top 2422 avahi 15 0 3696 352 260 S 0.0 0.1 0:00.94 avahi-daemon: running [test1.local] 323 root 10 -5 0 0 0 S 0.0 0.0 0:00.75 [kjournald] 2 PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND 3234 2721 0:00.95 0.0 0.2 15 0 R 2164 1164 1000 0 top 2721 2569 0:00.29 0.0 0.2 15 0 S 4620 3812 808 0 bash 2571 2569 0:00.45 0.0 0.1 16 0 S 4616 4020 596 0 bash 2569 2271 0:02.69 5.6 0.1 15 0 R 8216 7552 664 0 sshd 2539 1 0:00.01 0.0 0.0 24 0 S 1624 1524 100 0 mingetty 2516 1 0:00.01 0.0 0.0 24 0 S 1624 1524 100 0 mingetty 2515 1 0:00.01 0.0 0.0 24 0 S 1628 1528 100 0 mingetty 2514 1 0:00.00 0.0 0.0 18 0 S 1624 1524 100 0 mingetty 2511 1 0:00.00 0.0 0.0 18 0 S 1624 1524 100 0 mingetty 3 PID %MEM VIRT SWAP RES CODE DATA SHR nFLT nDRT S PR NI %CPU COMMAND 2405 1.2 25188 18m 6180 4 8688 1792 264 0 S 15 0 0.0 yum-updatesd 3234 0.2 2164 1164 1000 52 344 796 0 0 R 15 0 0.0 top 1866 0.2 9612 8672 940 4 1664 512 72 0 S 12 -3 0.0 python 2721 0.2 4620 3812 808 684 400 620 77 0 S 15 0 0.0 bash 2235 0.1 12716 11m 672 4 3656 376 75 0 S 15 0 0.0 python 2569 0.1 8216 7552 664 368 736 436 90 0 R 15 0 5.6 sshd 2439 0.1 5348 4708 640 260 1908 392 66 0 S 18 0 0.0 hald 2571 0.1 4616 4020 596 684 396 452 84 0 S 16 0 0.0 bash 1549 0.1 2280 1768 512 432 420 392 31 0 S 15 0 0.0 dhclient 4 PID PPID UID USER RUSER TTY TIME+ %CPU %MEM S COMMAND 2371 1 43 xfs xfs ? 0:00.07 0.0 0.0 S xfs 1941 1 32 rpc rpc ? 0:00.01 0.0 0.0 S portmap 2405 1 0 root root ? 0:02.23 0.0 1.2 S yum-updatesd 3234 2721 0 root root pts/1 0:00.95 0.0 0.2 R top 1866 1864 0 root root ? 0:00.30 0.0 0.2 S python 2721 2569 0 root root pts/1 0:00.29 0.0 0.2 S bash 2235 1 0 root root ? 0:00.24 0.0 0.1 S python 2569 2271 0 root root ? 0:02.69 5.6 0.1 R sshd 2571 2569 0 root root pts/0 0:00.45 0.0 0.1 S bash 可以自訂選擇要定義的螢幕的顯示欄位,比如我們要Mem螢幕增加Time欄位的顯示,要進行如下操作.首先輸入w切換視窗,使當前的螢幕為Mem,再輸入f,將看到如下的資訊.Current Fields: ANOPQRSTUVbcdefgjlmyzWHIKX for window 3:MemToggle fields via field letter, type any other key to return * A: PID = Process Id* N: %MEM = Memory usage (RES)* O: VIRT = Virtual Image (kb)* P: SWAP = Swapped size (kb)* Q: RES = Resident size (kb)* R: CODE = Code size (kb)* S: DATA = Data+Stack size (kb)* T: SHR = Shared Mem size (kb)* U: nFLT = Page Fault count* V: nDRT = Dirty Pages count b: PPID = Parent Process Pid c: RUSER = Real user name d: UID = User Id e: USER = User Name f: GROUP = Group Name g: TTY = Controlling Tty j: P = Last used cpu (SMP) l: TIME = CPU Time m: TIME+ = CPU Time, hundredths y: WCHAN = Sleeping in Function z: Flags = Task Flags <sched.h>* W: S = Process Status* H: PR = Priority* I: NI = Nice value* K: %CPU = CPU usage* X: COMMAND = Command name/line輸入L,表示選擇 l: TIME = CPU Time,選擇後將在被選項前出現*號,如:* L: TIME = CPU Time,再輸入Return,就可以在Mem屏看到time的資訊了,如下:3 PID %MEM VIRT SWAP RES CODE DATA SHR nFLT nDRT TIME S PR NI %CPU COMMAND 2405 1.2 25188 18m 6180 4 8688 1792 264 0 0:02 S 15 0 0.0 yum-updatesd 3234 0.2 2164 1164 1000 52 344 796 0 0 0:01 R 15 0 0.7 top 1866 0.2 9612 8672 940 4 1664 512 72 0 0:00 S 12 -3 0.0 python 2721 0.2 4620 3812 808 684 400 620 77 0 0:00 S 15 0 0.0 bash 2235 0.1 12716 11m 672 4 3656 376 75 0 0:00 S 15 0 0.0 python 2569 0.1 8216 7552 664 368 736 436 90 0 0:02 S 15 0 0.0 sshd 2439 0.1 5348 4708 640 260 1908 392 66 0 0:01 S 18 0 0.0 hald 2571 0.1 4616 4020 596 684 396 452 84 0 0:00 S 16 0 0.0 bash 1549 0.1 2280 1768 512 432 420 392 31 0 0:00 S 19 0 0.0 dhclient 同樣,我們也可以以指定的列進行排序,例如想將Mem屏以VIRT(虛擬記憶體)進行排序,進行下面的操作:輸入O(大寫),表示排序.看到只有N: %MEM = Memory usage (RES)行有*號,說明現在選中的是記憶體的百分比.我們輸入O(大寫),表示選中 o: VIRT = Virtual Image (kb)如下:Current Sort Field: N for window 3:MemSelect sort field via field letter, type any other key to return a: PID = Process Id b: PPID = Parent Process Pid c: RUSER = Real user name d: UID = User Id e: USER = User Name f: GROUP = Group Name g: TTY = Controlling Tty h: PR = Priority i: NI = Nice value j: P = Last used cpu (SMP) k: %CPU = CPU usage l: TIME = CPU Time m: TIME+ = CPU Time, hundredths N: %MEM = Memory usage (RES)* O: VIRT = Virtual Image (kb) p: SWAP = Swapped size (kb) q: RES = Resident size (kb) r: CODE = Code size (kb) s: DATA = Data+Stack size (kb) t: SHR = Shared Mem size (kb) u: nFLT = Page Fault count v: nDRT = Dirty Pages count w: S = Process Status x: COMMAND = Command name/line y: WCHAN = Sleeping in Function z: Flags = Task Flags <sched.h> 輸入斷行符號返回主屏後,就是以VIRT進行排序了,如下:3 PID %MEM VIRT SWAP RES CODE DATA SHR nFLT nDRT TIME S PR NI %CPU COMMAND 2405 1.2 25188 18m 6180 4 8688 1792 264 0 0:02 S 15 0 0.0 yum-updatesd 2235 0.1 12716 11m 672 4 3656 376 75 0 0:00 S 15 0 0.0 python 2141 0.0 12692 12m 244 84 10m 188 1 0 0:00 S 25 0 0.0 pcscd 1864 0.1 12048 11m 300 92 10m 220 14 0 0:00 S 16 -3 0.0 auditd 2252 0.0 9644 9484 160 364 644 156 5 0 0:00 S 18 0 0.0 cupsd 1866 0.2 9612 8672 940 4 1664 512 72 0 0:00 S 12 -3 0.0 python 2190 0.1 9332 8868 464 196 7168 360 37 0 0:00 S 25 0 0.0 automount 2569 0.1 8216 7552 664 368 736 436 90 0 0:02 S 15 0 0.0 sshd 2439 0.1 5348 4708 640 260 1908 392 66 0 0:01 S 18 0 0.0 hald
四)記憶體缺頁與置換的綜合執行個體為完成最後一個測試,我們把上個程式做了更改,加入了訊號處理及時間輸出,如下:#include <stdio.h>#include <string.h>#include <stdlib.h>#include <signal.h>#include <time.h>#include <unistd.h>#include <sys/time.h>void handler(int sig){}#define TIMESPEC2FLOAT(tv) ((double) (tv).tv_sec+(double) (tv).tv_nsec*1e-9)intmain (int argc,char *argv[]){ if (argc != 2) exit(0); signal(SIGUSR1, handler); size_t mb = strtoul(argv[1], NULL, 0); size_t nbytes = mb * 0x100000; char *ptr = (char *) malloc (nbytes); if (ptr == NULL){ perror("malloc"); exit(EXIT_FAILURE); } int val = 0; const size_t stride = sysconf(_SC_PAGE_SIZE); while(1){ int i; struct timespec t1, t2; clock_gettime(CLOCK_REALTIME, &t1); for (i = 0;i<nbytes; i += stride){ ptr[i] = val; } val++; clock_gettime(CLOCK_REALTIME, &t2); printf("touched %d mb; in %.6f sec\n", mb, TIMESPEC2FLOAT(t2) - TIMESPEC2FLOAT(t1)); pause(); }return 0;}編譯來源程式,並做這個程式的兩個軟連結,如下:gcc -O2 -o son-of-hog son-of-hog.c -lrtln -s son-of-hog hog-aln -s son-of-hog hog-b查看當前的記憶體:free -m total used free shared buffers cachedMem: 503 185 317 0 10 140-/+ buffers/cache: 35 468Swap: 1027 0 1027注:此時我們看到有468MB的空閑記憶體,這裡面有140MB的cached和10MB的buffers清空置換空間(swap):swapoff -aswapon -afree -m total used free shared buffers cachedMem: 503 50 452 0 1 16-/+ buffers/cache: 33 470Swap: 1027 0 1027我們執行hog-a,以此佔用300MB的記憶體空間,查看它分配記憶體的時間:./hog-a 300 &[1] 2592touched 300 mb; in 2.784745 sec注:這裡分配300MB的記憶體共用了2.7秒多.這裡面有換頁和回收cache的時間.對這個進程使用一個SIGUSR1將再次喚醒並訪問記憶體:kill -USR1 %1touched 300 mb; in 0.010827 sec注:這裡只用了10ms.我們這時執行hog-b,看看它會佔用多少時間:./hog-b 300 & [2] 2601touched 300 mb; in 4.432349 sec注:因為實體記憶體都被分配了,為給hog-b分配空間,系統做了置換處理,即將實體記憶體分配給hog-b,而將hog-a佔用的記憶體置換到swap.對hog-b發送SIGUSR1訊號,再次喚醒程式,這時只用143mspkill -USR1 hog-btouched 300 mb; in 0.143772 sec我們用top命令來查看:top -p $(pgrep hog-a) -p $(pgrep hog-b)1:Def - 06:12:26 up 10 min, 1 user, load average: 0.01, 0.10, 0.10Tasks: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombieCpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%stMem: 515600k total, 509468k used, 6132k free, 956k buffersSwap: 1052248k total, 162924k used, 889324k free, 11312k cached1 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2601 root 18 0 301m 300m 428 S 0.0 59.7 0:03.32 ./hog-b 300 2592 root 18 0 301m 154m 428 S 0.0 30.7 0:02.76 ./hog-a 300 2 PID PPID TIME+ %CPU %MEM PR NI S VIRT SWAP RES UID COMMAND 2601 2553 0:03.32 0.0 59.7 18 0 S 301m 1112 300m 0 hog-b 2592 2553 0:02.76 0.0 30.7 18 0 S 301m 147m 154m 0 hog-a 3 PID %MEM VIRT SWAP RES CODE DATA SHR nFLT nDRT S PR NI %CPU COMMAND 2601 59.7 301m 1112 300m 4 300m 428 235 0 S 18 0 0.0 hog-b 2592 30.7 301m 147m 154m 4 300m 428 0 0 S 18 0 0.0 hog-a 4 PID PPID UID USER RUSER TTY TIME+ %CPU %MEM S COMMAND 2601 2553 0 root root pts/0 0:03.32 0.0 59.7 S hog-b 2592 2553 0 root root pts/0 0:02.76 0.0 30.7 S hog-a 這裡做幾個說明:1)兩個程式都用了300MB的虛擬記憶體.(VIRT 301m)2)由於hog-a先運行,而後佔用的記憶體資料被置換到swap,所以只佔用了154MB的實體記憶體(RES 154MB),佔用了147MB的SWAP空間(SWAP 147).3)由於hog-b後運行,所以用搶佔了hog-a的實體記憶體,此時它佔用了300M的實體記憶體(RES 300MB),由於置換它產生了235次頁錯誤(Page Fault count).