Linux HugePage特性

最後更新：2013-12-12 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

Linux HugePage特性

HugePage，就是指的大頁記憶體管理方式。與傳統的4kb的普通頁管理方式相比，HugePage為管理大記憶體(8GB以上)更為高效。本文描述了什麼是HugePage，以及HugePage的一些特性。

1、Hugepage的引入
作業系統對於資料的存取直接從實體記憶體要比從磁碟讀寫資料要快的多，但是實體記憶體是有限的，這樣就引出了實體記憶體與虛擬記憶體的概念。虛擬記憶體就是為了滿足實體記憶體的不足而提出的策略，它是利用磁碟空間虛擬出的一塊邏輯記憶體，這部分磁碟空間Windows下稱之為虛擬記憶體，Linux下被稱為交換空間(Swap Space)。

對於這個大記憶體的管理(實體記憶體+虛擬記憶體)，大多數作業系統採用了分段或分頁的方式進行管理。分段是粗粒度的管理方式，而分頁則是細粒度管理方式，分頁方式可以避免記憶體空間的浪費。相應地，也就存在記憶體的物理地址與虛擬位址的概念。通過前面這兩種方式，CPU必須把虛擬位址轉換程實體記憶體地址才能真正訪問記憶體。為了提高這個轉換效率，CPU會緩衝最近的虛擬記憶體地址和實體記憶體地址的映射關係，並儲存在一個由CPU維護的映射表中。為了盡量提高記憶體的訪問速度，需要在映射表中儲存盡量多的映射關係。

linux的記憶體管理採取的是分頁存取機制，為了保證實體記憶體能得到充分的利用，核心會按照LRU演算法在適當的時候將實體記憶體中不經常使用的記憶體頁自動交換到虛擬記憶體中，而將經常使用的資訊保留到實體記憶體。通常情況下，Linux預設情況下每頁是4K，這就意味著如果實體記憶體很大，則映射表的條目將會非常多，會影響CPU的檢索效率。因為記憶體大小是固定的，為了減少映射表的條目，可採取的辦法只有增加頁的尺寸。因此Hugepage便因此而來。也就是打破傳統的小頁面的記憶體管理方式，使用大頁面2m,4m,16m等等。如此一來映射條目則明顯減少。如果系統有大量的實體記憶體（大於8G），則物理32位的作業系統還是64位的，都應該使用Hugepage。

2、Hugepage的相關術語
Page Table:
A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.
如前所述，page table也就是一種用於記憶體管理的實現方式，用於物理地址到虛擬之間的映射。因此對於記憶體的訪問，先是訪問Page Table，然後根據Page Table 中的映射關係，隱式的轉移到物理地址來存取資料。

TLB:
A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.
CPU中的一塊固定大小的cache，包含了部分page table的映射關係，用於快速實現虛擬位址到物理地址的轉換。

hugetlb:
This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.
hugetlb 是TLB中指向HugePage的一個entry(通常大於4k或預定義頁面大小)。 HugePage 通過hugetlb entries來實現，也可以理解為HugePage 是hugetlb page entry的一個控制代碼。

hugetlbfs:
This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.
一個類似於tmpfs的新的in-memory filesystem，在2.6核心被提出。

3、常見的錯誤概念
WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems
RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations

WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS
RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too.

WRONG: hugetlbfs means hugetlb
RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs

WRONG: hugetlbfs means hugepages
RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs.

4、Regular Pages 與 HugePages
a、Regular Pages
在中有兩個不同的進程，兩個進程對於記憶體的訪問是首先訪問本地的page table，而本地的page table又參照了system-wide table的page(也就是前面描述的TLB)，最終system-wide table中的entry指向了實際的物理地址。圖中物理地址page size大小4kb。也可以看到進程1和進程2在system-wide table中都指向了page2，也就是同一個物理地址。Oracle sga中共用記憶體的使用會出現上述情形。

b、Huge Pages
在中，本地的page table 與system page table中都包含了huge page屬性。因此page table中的任意一個page可能使用了常規的page，
也有可能使用了huge page。同樣進程1和進程2都共用了其中的Hpage2。圖中的實體記憶體常規的page size是4kb，huge page size 是4mb。

--Author : Robinson
--Blog : http://blog.csdn.net/robinson_0612

5、huge page 的大小
huge page 的大小取決於所使用的作業系統的核心版本以及不同的硬體平台
可以使用$grep Hugepagesize /proc/meminfo來查看huge page 的大小
下面是不同平台常用的huge page 的大小。
HW Platform Source Code Tree Kernel 2.4 Kernel 2.6
----------------- --------------------- ------------ -------------
Linux x86 (IA32) i386 4 MB 4 MB *
Linux x86-64 (AMD64, EM64T) x86_64 2 MB 2 MB
Linux Itanium (IA64) ia64 256 MB 256 MB
IBM Power Based Linux (PPC64) ppc64/powerpc N/A ** 16 MB
IBM zSeries Based Linux s390 N/A 1 MB
IBM S/390 Based Linux s390 N/A N/A

6、使用huge page的優點
對於較大的系統記憶體以及sga，使用hugepage可以極大程度的提高Oracle資料庫效能。

a、Not swappable
HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.
無需交換。也就是不存在頁面由於記憶體空間不足而存在換入換出的問題

b、Relief of TLB pressure
Hugepge uses fewer pages to cover the physical address space, so the size of “book keeping” (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB
TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA
Fewer TLB entries for the SGA also means more for other parts of the address space
減輕TLB的壓力，也就是降低了cpu cache可快取的地址映射壓力。由於使用了huge page，相同的記憶體大小情況下，管理的虛擬位址數量變少。
TLB entry可以包含更多的地址空間，cpu的定址能力相應的得到了增強。

c、Decreased page table overhead
Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also Document 361468.1.
降低page table負載，對於普通的page，每個entry需要64bytes進行管理，對於50gb的記憶體，管理這些entry，需要800mb的大小
(50*1024*1024)kb/4kb*64bytes/1024/1024=800mb。

d、Eliminated page table lookup overhead
Since the pages are not subject to replacement, page table lookups are not required.( 消除page table尋找負載)

e、Faster overall memory performance
On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.(提高記憶體的整體效能)

7、未正確配值huge page的風險
基於大記憶體(>8GB)的管理，如果配值或正確配值huge page，可能存在下列不確定的隱性問題
HugePages not used (HugePages_Total = HugePages_Free) at all wasting the amount configured for
Poor database performance
System running out of memory or excessive swapping
Some or any database instance cannot be started
Crucial system services failing (e.g.: CRS)

8、基於2.6核心的配值步驟
The kernel parameter used for HugePages is vm.nr_hugepages which is based on the number of the pages. SLES9, RHEL4 and Asianux 2.0 are examples of distributions with the 2.6 kernel. For the configuration, follow steps below:
a. Start instance(s)
b. Calculate nr_hugepages using script from Document 401749.1
c. Set kernel parameter:
# sysctl -w vm.nr_hugepages=<value from above>
and make sure that the parameter is persistent to reboots. e.g. On SLES9:
# chkconfig boot.sysctl on
d. Check available hugepages:
$ grep Huge /proc/meminfo
e. Restart instances
f. Check available hugepages:
$ grep Huge /proc/meminfo

9、注意事項
a、HugePage使用的是共用記憶體，在作業系統啟動期間被動態分配並被保留，因為他們不會被置換。
b、由於不會被置換的特點，在使用hugepage的記憶體不能被其他的進程使用。所以要合理設定該值，避免造成記憶體浪費。
c、對於只使用Oracle的伺服器來說，把Hugepage設定成SGA(所有instance SGA之和)大小即可。
d、如果增加HugePage或添加實體記憶體或者是當前伺服器增加了新的instance以及SGA發生變化，應該重新設定所需的HugePage。
e、reference: HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1] To Bottom

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux HugePage特性

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support