oracle RAC 使用Jumbo Frames

最後更新：2015-08-15 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：oracle 乙太網路 frame 通訊 9000

先來看看Jumbo Frames是什麼東東。

我們知道在TCP/IP 協義簇中，乙太網路資料連結層通訊的單位是幀（frame）,1幀的大小被定為1,518位元組，傳統的10M網卡frame的MTU（Maximum Transmission Unit傳輸單元最大值）大小是1500位元組（如樣本所示）,基中14 位元組保留給了幀的頭，4位元組保留給CRC校正，實際上去整個TCP/IP頭40位元組，有效資料是1460位元組。後來的100M和1000M網卡保持了相容，也是1500位元組。但是對1000M網卡來說，這意味著更多的中斷和和處理時間。因此千兆網卡使用“Jumbo Frames”將frmae擴充至9000位元組。為什麼是9000位元組，而不是更大呢？因為32位的CRC校正和對大於12000位元組來說將失去效率上的優勢，而9000位元組對8KB的應用來說，比如NFS，已經足夠了。

eg:

[[email protected] ~]# ifconfig
eth0      Link encap:Ethernet HWaddr 08:00:27:37:9C:D0
          inet addr:192.168.0.103 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe37:9cd0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:9093 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10011 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:749067 (731.5 KiB) TX bytes:4042337 (3.8 MiB)

如果是在以上網路的樣本一個配置MTU ~ 1500位元組（1.5K）路徑中，一個資料區塊的大小為8K的從一個節點傳送到另一個節點,那麼需要六個資料包來傳輸。8K緩衝分割成六個IP資料包發送到接收側面。在接收端，這六個IP分組被接收並重新建立8K緩衝。重組緩衝區最終被傳遞給應用程式，用於進一步處理。

650) this.width=650;" src="http://s3.51cto.com/wyfs02/M02/71/6C/wKiom1XO2QCDJBGgAACtx0uB_Rc017.jpg" title="2015-08-15_141313.png" alt="wKiom1XO2QCDJBGgAACtx0uB_Rc017.jpg" />

圖1

圖1顯示了資料區塊如何分割和重組。在這個圖中，LMS進程發送一個8KB資料區塊到遠程進程。在傳送的過程中，在緩衝區8KB被分割為六個IP資料包，而這些IP包發送通過網路發送到接收側。在接收側，核心線程重組這六個IP資料包，並把8KB資料區塊存放在緩衝區中。前台進程從通訊端緩衝區讀取它到PGA，同時複製到database buffer中。

在以上過程中將會引起片段與重新組合，即過度分割和重組問題，這在無形中增加了這資料庫節點的以CPU使用率。這種情況我們不得不選擇Jumbo Frames 。

現在我們的網路環境都可以達到千兆，萬兆，甚至是更高，那麼我們可以通過以下命令在系統中進行設定（前提是你的環境是gigabit Ethernet switches and gigabit Ethernet network）：

# ifconfig eth0 mtu 9000

使其永久生效

# vi /etc/sysconfig/network-script/ifcfg-eth0

添加

MTU 9000

更多細節參看 http://www.cyberciti.biz/faq/rhel-centos-debian-ubuntu-jumbo-frames-configuration/

下面的文章對以上設定進行很好的測試：

https://blogs.oracle.com/XPSONHA/entry/jumbo_frames_for_rac_interconn_1

對其測試步驟與結果摘錄如下：

Theory tells us properly configured Jumbo Frames can eliminate 10% of overhead on UDP traffic.

So how to test ?

I guess an ‘end to end‘ test would be best way. So my first test is a 30 minute Swingbench run against a two node RAC, not too much stress in the begin.

The MTU configuration of the network bond (and the slave nics will be 1500 initially).

After the test, collect the results on the total transactions, the average transactions per second, the maximum transaction rate (results.xml), interconnect traffic (awr) and cpu usage. Then, do exactly the same, but now with an MTU of 9000 bytes. For this we need to make sure the switch settings are also modified to use an MTU of 9000.

B.t.w.: yes, it‘s possible to measure network only, but real-life end-to-end testing with a real Oracle application talking to RAC feels like the best approach to see what the impact is on for example the avg. transactions per second.

In order to make the test as reliable as possible some remarks:
- use guaranteed snapshots to flashback the database to its original state.
- stop/start the database (clean the cache)

B.t.w: before starting the test with an MTU of 9000 bytes the correct setting had to be proofed.

One way to do this is using ping with a packet size (-s) of 8972 and prohibiting fragmentation (-M do).
One could send Jumbo Frames and see if they can be sent without fragmentation.

[[email protected] rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
8980 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.914 ms

As you can see this is not a problem. While for packages larger then 9000 bytes, this is a problem:

[[email protected] rk]# ping -s 8973 -M do node02-ic -c 5
--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING node02-ic. (192.168.23.32) 8973(9001) bytes of data.
From node02-ic. (192.168.23.52) icmp_seq=0 Frag needed and DF set (mtu = 9000)

Bringing back the MTU size to 1500 should also prohibit sending of fragmented 9000 packages:

[[email protected] rk]# ping -s 8972 -M do node02-ic -c 5
PING node02-ic. (192.168.23.32) 8972(9000) bytes of data.
--- node02-ic. ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

Bringing back the MTU size to 1500 and sending ‘normal‘ packages should work again:

[[email protected] rk]# ping node02-ic -M do -c 5
PING node02-ic. (192.168.23.32) 56(84) bytes of data.
64 bytes from node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.174 ms

--- node02-ic. ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.174/0.186/0.198/0.008 ms, pipe 2

An other way to verify the correct usage of the MTU size is the command ‘netstat -a -i -n‘ (the column MTU size should be 9000 when you are performing tests on Jumbo Frames):

Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0 1500 0 10371535 0 0 0 15338093 0 0 0 BMmRU
bond0:1 1500 0 - no statistics available - BMmRU
bond1 9000 0 83383378 0 0 0 89645149 0 0 0 BMmRU
eth0 9000 0 36 0 0 0 88805888 0 0 0 BMsRU
eth1 1500 0 8036210 0 0 0 14235498 0 0 0 BMsRU
eth2 9000 0 83383342 0 0 0 839261 0 0 0 BMsRU
eth3 1500 0 2335325 0 0 0 1102595 0 0 0 BMsRU
eth4 1500 0 252075239 0 0 0 252020454 0 0 0 BMRU
eth5 1500 0 0 0 0 0 0 0 0 0 BM

As you can see my interconnect in on bond1 (build on eth0 and eth2). All 9000 bytes.

Not finished yet, no conclusions yet, but here is my first result.
You will notice the results are not that significantly.

MTU 1500:
TotalFailedTransactions : 0
AverageTransactionsPerSecond : 1364
MaximumTransactionRate : 107767
TotalCompletedTransactions : 4910834

MTU 9000:
TotalFailedTransactions : 1
AverageTransactionsPerSecond : 1336
MaximumTransactionRate : 109775
TotalCompletedTransactions : 4812122

In a chart this will look like this:
650) this.width=650;" alt="udp_traf01.png" src="http://blogs.oracle.com/XPSONHA/resource/udp_traf01.png" height="257" width="458" />

As you can see, the number of transactions between the two tests isn‘t really that significant, but the UDP traffic is less ! Still, I expected more from this test, so I have to put more stress to the test.

I noticed the failed transaction, and found "ORA-12155 TNS-received bad datatype in NSWMARKER packet". I did verify this and I am sure this is not related to the MTU size. This is because I only changed the MTU size for the interconnect and there is no TNS traffic on that network.

As said, I will now continue with tests that have much more stress on the systems:
- number of users changed from 80 to 150 per database
- number of databases changed from 1 to 2
- more network traffic:
- rebuild the Swingbench indexes without the ‘REVERSE‘ option
- altered the sequences and lowered increment by value to 1 and cache size to 3. (in stead of 800)
- full table scans all the time on each instance
- run longer (4 hours in stead of half an hour)

Now, what you see is already improving. For the 4 hour test, the amount of extra UDP packets sent with an MTU size of 1500 compared to an MTU size of 9000 is about 2.5 to 3 million, see this chart:

650) this.width=650;" alt="udptraf02.png" src="http://blogs.oracle.com/XPSONHA/resource/udptraf02.png" height="252" width="472" />

Imagine yourself what an impact this has. Each package you not send save you the network-overhead of the package itself and a lot of CPU cycles that you don‘t need to spend.

The load average of the Linux box also decreases from an avg of 16 to 14.
650) this.width=650;" alt="load_avg01.png" src="http://blogs.oracle.com/XPSONHA/resource/load_avg01.png" height="259" width="534" />

In terms of completed transactions on different MTU sizes within the same timeframe, the chart looks like this:

650) this.width=650;" alt="trans01.png" src="http://blogs.oracle.com/XPSONHA/resource/trans01.png" height="243" width="498" />

To conclude this test two very high load runs are performed. Again, one with an MTU of 1500 and one with an MTU of 9000.

In the charts below you will see less CPU consumption when using 9000 bytes for MTU.

Also less packets are sent, although I think that number is not that significant compared to the total number of packets sent.

650) this.width=650;" alt="cpu_load_01.png" src="http://blogs.oracle.com/XPSONHA/resource/cpu_load_01.png" height="243" width="297" />

650) this.width=650;" alt="packets_01.png" src="http://blogs.oracle.com/XPSONHA/resource/packets_01.png" height="263" width="336" />

My final thoughts on this test:

1. you will hardly notice the benefits of using Jumbo on a system with no stress
2. you will notice the benefits of Jumbo using Frames on a stressed system and such a system will then use less CPU and will have less network overhead.

This means Jumbo Frames help you scaling out better then regular frames.

Depending on the interconnect usage of your applications the results may vary of course. With interconnect traffic intensive applications you will see the benefits earlier then with application that have relatively less interconnect activity.

I would use Jumbo Frames to scale better, since it saves CPU and reduces network traffic and this way leaves space for growth.

本文出自 “取眾之所長，為我所用” 部落格，請務必保留此出處http://jonsen.blog.51cto.com/4559666/1684850

oracle RAC 使用Jumbo Frames

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

oracle RAC 使用Jumbo Frames

聯繫我們

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support