Oracle RAC uses Jumbo Frames

Last Update:2015-08-15 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's take a look at what jumbo frames is.

We know that in the TCP/IP covariance cluster, the Ethernet Data Link layer communicates in frames (frame), the size of 1 frames is set to 1,518 bytes, the MTU of the traditional 10M nic frame (Maximum transmission Unit Max transmission Unit) The size is 1500 bytes (as shown in the example), the base 14 bytes are reserved to the frame header, 4 bytes are reserved to the CRC checksum, actually go to the entire TCP/IP header 40 bytes, valid data is 1460 bytes. Later, the 100M and 1000M NICs remain compatible and are also 1500 bytes. But for 1000M NICs, this means more interruptions and more processing time. Therefore, the Gigabit network adapter uses "Jumbo Frames" to extend the Frmae to 9000 bytes. Why is 9000 bytes, not bigger? Because the 32-bit CRC checksum will lose its efficiency advantage over 12000 bytes, 9000 bytes is sufficient for 8KB applications, such as NFS.

If the example in the above network is a configuration MTU ~ 1500 byte (1.5K) path, a data block with a size of 8K is transferred from one node to another, then six packets are required for transmission. The 8K cache is split into six IP packets sent to the receiving side. At the receiving end, the six IP packets are received and re-created 8K buffers. The reassembly buffer is eventually passed to the application for further processing.

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/71/6C/wKiom1XO2QCDJBGgAACtx0uB_Rc017.jpg "title=" 2015-08-15_141313.png "alt=" Wkiom1xo2qcdjbggaactx0ub_rc017.jpg "/>

Figure 1

Figure 1 shows how the data blocks are split and reassembled. In this diagram, the LMS process sends a 8KB block of data to the remote process. During the transfer process, the buffer 8KB is split into six IP packets, and these IP packets are sent over the network to the receiving side. On the receiving side, the kernel thread reorganizes the six IP packets and stores the 8KB data blocks in the buffer. The foreground process reads it from the socket buffer to the PGA and copies it to the database buffer.

in the above process will cause fragmentation and regroup, i.e. over-segmentation and reassembly problems, which in virtually increase the CPU usage of this database node. In this case we have to choose jumbo Frames.

Now our network environment can reach gigabit, million trillion, even higher, then we can be set up in the system by the following command (if your environment is gigabit Ethernet switches and Gigabit Ethernet Network):

# ifconfig eth0 mtu 9000

Make it Permanent

# vi /etc/sysconfig/network-script/ifcfg-eth0

添加

MTU 9000

See http://www.cyberciti.biz/faq/rhel-centos-debian-ubuntu-jumbo-frames-configuration/for more details

The following article is a good test of the above settings:

Https://blogs.oracle.com/XPSONHA/entry/jumbo_frames_for_rac_interconn_1

An excerpt of its test steps and results is as follows:

Theory tells us properly configured Jumbo Frames can eliminate 10% of overhead on UDP traffic.

So what to test?

I guess an ' end to end ' test would is best. So my first test is a minute swingbench run against a A to node RAC, not too much stress in the begin.

The MTU configuration of the network Bond (and the slave NICs'll be initially).

After the test, collect the results to the total transactions, the average transactions per second, the maximum Transactio N Rate (results.xml), interconnect traffic (AWR) and CPU usage. Then, do exactly the same, and now has an MTU of 9000 bytes. For this we need to make sure the switch settings is also modified to use an MTU of 9000.

B.T.W.: Yes, it ' s possible to measure network only, but Real-life end-to-end testing with a real Oracle application talkin G to RAC feels like the best approach to see what the impact was on for example the Avg. transactions per second.

In order to make the test as reliable as possible some remarks:
-Use guaranteed snapshots to flashback the database to its original state.
-Stop/start The database (clean the cache)

B.t.w:before starting the test with a MTU of 9000 bytes The correct setting had to be proofed.

The one-to-do-is-using ping with a packet size (-s) of 8972 and prohibiting fragmentation (-M do).
One could send Jumbo Frames and see if they can is sent without.

[[email protected] rk]# ping-s 8972-m do node02-ic-c 5
PING Node02-ic. (192.168.23.32) 8972 (9000) bytes of data.
8980 bytes from Node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.914 ms

As you can see this is not a problem. While for packages larger then 9000 bytes, this is a problem:

[[email protected] rk]# ping-s 8973-m do node02-ic-c 5
---node02-ic. Ping Statistics---
5 Packets transmitted, 5 received, 0% packet loss, time 4003MS
RTT Min/avg/max/mdev = 0.859/0.955/1.167/0.109 ms, pipe 2
PING Node02-ic. (192.168.23.32) 8973 (9001) bytes of data.
From Node02-ic. (192.168.23.52) icmp_seq=0 Frag needed and DF set (MTU = 9000)

Bringing back the MTU size to should also prohibit sending of fragmented 9000 packages:

[[email protected] rk]# ping-s 8972-m do node02-ic-c 5
PING Node02-ic. (192.168.23.32) 8972 (9000) bytes of data.
---node02-ic. Ping Statistics---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

Bringing back the MTU size to sending ' normal ' packages should work again:

[[email protected] rk]# Ping node02-ic-m do-c 5
PING Node02-ic. (192.168.23.32) bytes of data.
Bytes from Node02-ic. (192.168.23.32): icmp_seq=0 ttl=64 time=0.174 ms

---node02-ic. Ping Statistics---
5 Packets transmitted, 5 received, 0% packet loss, time 3999ms
RTT Min/avg/max/mdev = 0.174/0.186/0.198/0.008 ms, pipe 2

An other-to-verify the correct usage of the MTU size is the command ' Netstat-a-i-n ' (the column MTU size should be 9000 when is performing tests on Jumbo Frames):

Kernel Interface Table
Iface MTU Met rx-ok rx-err rx-drp rx-ovr tx-ok tx-err tx-drp TX-OVR FLG
Bond0 0 10371535 0 0 0 15338093 0 0 0 Bmmru
Bond0:1 0-no Statistics Available-bmmru
BOND1 9000 0 83383378 0 0 0 89645149 0 0 0 Bmmru
Eth0 9000 0 0 0 0 88805888 0 0 0 Bmsru
Eth1 0 8036210 0 0 0 14235498 0 0 0 Bmsru
ETH2 9000 0 83383342 0 0 0 839261 0 0 0 Bmsru
Eth3 0 2335325 0 0 0 1102595 0 0 0 Bmsru
Eth4 0 252075239 0 0 0 252020454 0 0 0 Bmru
Eth5 0 0 0 0 0 0 0 0 0 BM

As you can see my interconnect in on Bond1 (build on eth0 and eth2). All 9000 bytes.

Not finished yet, no conclusions yet, but this is my first result.
You'll notice the results is not that significantly.

MTU 1500:
totalfailedtransactions:0
averagetransactionspersecond:1364
maximumtransactionrate:107767
totalcompletedtransactions:4910834

MTU 9000:
Totalfailedtransactions:1
averagetransactionspersecond:1336
maximumtransactionrate:109775
totalcompletedtransactions:4812122

In a chart this would look like this:
650) this.width=650; "alt=" Udp_traf01.png "src=" Http://blogs.oracle.com/XPSONHA/resource/udp_traf01.png "height=" 257 "width=" 458 "/>

As you can see, the number of transactions between the tests isn ' t really that significant, but the UDP traffic is Les S! Still, I expected more from this test, so I had to put more stress to the test.

I noticed the failed transaction, and found "ORA-12155 tns-received bad datatype in Nswmarker packet". I did verify this and I am sure this is not related to the MTU size. This was because I only changed the MTU size for the interconnect and there are no TNS traffic on that network.

As said, I'll now continue with tests that has much more stress on the systems:
-Number of users changed from
-Number of databases changed from 1 to 2
-More network traffic:
-Rebuild the Swingbench indexes without the ' REVERSE ' option
-Altered the sequences and lowered increment by value to 1 and cache size to 3. (in stead of 800)
-Full table scans all the time on each instance
-Run longer (4 hours in stead of half an hour)

Now, what are already improving. For the 4 hour test, the amount of extra UDP packets sent with an MTU size of $ compared to a MTU size of 9000 is abou T 2.5 to 3 million, see this chart:

650) this.width=650; "alt=" Udptraf02.png "src=" Http://blogs.oracle.com/XPSONHA/resource/udptraf02.png "height=" 252 "Width=" 472 "/>

Imagine yourself what a impact this has. Each package you do not send save your The network-overhead of the package itself and a lot of CPU cycles so you don ' t need To spend.

The load average of the Linux box also decreases from an AVG of 14.
650) this.width=650; "alt=" Load_avg01.png "src=" Http://blogs.oracle.com/XPSONHA/resource/load_avg01.png "height=" 259 "width=" 534 "/>

In terms of completed transactions-different MTU sizes within the same timeframe, the chart looks like this:

650) this.width=650; "alt=" Trans01.png "src=" Http://blogs.oracle.com/XPSONHA/resource/trans01.png "height=" 243 " Width= "498"/>

To conclude the this test, very high load runs is performed. Again, one with an MTU of 9000.

The charts below you'll see less CPU consumption when using 9000 bytes for MTU.

Also less packets was sent, although I think that number was not this significant compared to the total number of packets s Ent.

650) this.width=650; "alt=" Cpu_load_01.png "src=" Http://blogs.oracle.com/XPSONHA/resource/cpu_load_01.png "height= "243" width= "297"/>

650) this.width=650; "alt=" Packets_01.png "src=" Http://blogs.oracle.com/XPSONHA/resource/packets_01.png "height=" 263 "width=" 336 "/>

My final thoughts on this test:

1. You'll hardly notice the benefits of using Jumbo on a system with no stress
2. You'll notice the benefits of Jumbo using Frames on a stressed system and such a system would then use less CPU and WI ll has less network overhead.

This means Jumbo Frames help you to scaling out better then regular Frames.

Depending on the interconnect usage of your applications, the results may vary of course. With interconnect traffic intensive applications you'll see the benefits earlier then with application that has relativ Ely less interconnect activity.

I would use Jumbo Frames to scale better, since it saves CPUs and reduces network traffic and this is the leaves space for Gro wth.

This article is from "the director of the audience, for Me" blog, please be sure to keep this source http://jonsen.blog.51cto.com/4559666/1684850

Oracle RAC uses Jumbo Frames

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More