One of the Hadoop tuning: overview

Last Update:2015-03-13 Source: Internet

Author: User

Tags switches high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Common causes of poor Hadoop cluster performance

(a) Hardware environment1, cpu/memory is insufficient, or underutilized 2, network cause 3, disk reasons

(ii) Reasons for map missions

1, the input file too small files, resulting in multiple start and stop the JVM process. JVM reuse can be set.

2, Data skew: Large files and non-segmented, resulting in the processing of these files map takes a long time.

3, the data localization effect is poor.

(iii) Causes of the reduce task

1, reduce the number of tasks too large or too small 2, data skew: A portion of key record number is too large, causing some reduce to perform too slow
3. Slow Shuffle and sequencing
(iv) Improper configuration of Hadoop

(v) Java code and JVM tuning

First, hardware tuning

1, cpu/memory usage vmstat, top

$ vmstat-s M 5procs-----------memory-------------Swap-------io------System-------CPU-----r B swpd free buff  Cache si so bi bo in CS us sy ID WA St 0 0 0 566 232 239 0 0 65 824 59 39 4 1 93 1 0 0 3 0 366 232 432 0 0 0 25929 2638 2776 14 14 43 28 0 2 1 0 241 232 5 43 0 0 26 38110 2123 1316 75 11 0 14 0 3 0 0 78 232 543 0 0 0 11784 1558 1028 80 4 1    6 0 0 0 0 0 189 232 543 0 0 0 142 1052 933 70 3 27 1 0 0 0 0 185 232 543   0 0 0 30 500 589 15 1 84 0 0 2 0 0 180 232 544 0 0 0 3 502 595 12 1 87 0    0 0 0 0 508 232 293 0 0 0 74 1161 1036 77 5 18 0 0 0 0 0 626 233 175 0 0 0 150 385 447 2 1 97 0 0

The meanings of each of the above fields are:

Well, the command is complete, now the actual combat to explain the meaning of each parameter.

r means the queue is running (that is, how many processes are actually allocated to the CPU), and when this value exceeds the number of CPUs, there is a CPU bottleneck. This is also related to top of the load, the general load over 3 is relatively high, more than 5 is high, more than 10 is not normal, the state of the server is very dangerous. The load on top is similar to the run queue per second. If the running queue is too large, it means that your CPU is busy, which generally results in high CPU usage.

b represents the blocking process, which is not much to say, the process is blocked, you understand.

swpd Virtual memory has been used size, if greater than 0, indicates that your machine is out of physical memory, if not the cause of program memory leaks, then you should upgrade the memory or the memory-consuming task to other machines.

Free physical memory size, my machine memory total 8G, the remaining 3415M.

Buff Linux/unix system is used to store, directory inside what content, permissions, etc. of the cache, I machine about more than 300 m

the cache cache is used directly to memorize the files we open, to buffer the files, I have about 300 m of this machine (this is the smart place of Linux/unix, the spare part of the physical memory to do the file and directory cache, is to improve the performance of the program execution, When the program uses memory, buffer/cached is quickly used. )

Si reads the size of the virtual memory from disk every second, if this value is greater than 0, it means that the physical memory is not enough or the memory leaks, to find out the memory process. My machine has plenty of memory and everything is fine.

so per second the virtual memory is written to the size of the disk, if this value is greater than 0, ibid.

The number of blocks received per second by BI block devices, where the block device refers to all the disks and other block devices on the system, the default block size is 1024byte, I have no IO operation on this machine, so it has been 0, but I have been working on copying large amounts of data (2-3T) The machine has seen can reach 140000/s, disk write speed of almost 140M per second

The number of blocks that Bo block devices send per second, such as when we read a file, the Bo will be greater than 0. Bi and Bo are generally close to 0, otherwise the IO is too frequent and needs to be adjusted.

in CPU interrupts per second, including time interrupts

CS per second, such as the number of context switches, such as we call the system function, the context switch, the thread of the switch, but also the process context switch, the smaller the value of the better, too big, to consider the number of threads or processes, such as in Apache and Nginx Web server , we generally do performance testing will carry out thousands of concurrent or even tens of thousands of concurrent testing, the process of selecting a Web server can be the peak of the process or the thread has been down, pressure measurement, until CS to a relatively small value, the process and the number of threads is a more appropriate value. System calls are also, each time the system function is called, our code will enter the kernel space, resulting in context switching, this is very resource-intensive, but also try to avoid frequent calls to system functions. Too many context switches means that most of your CPU is wasted in context switching, resulting in less time for the CPU to do serious work, and the CPU not being fully utilized, is undesirable.

US user CPU time, I used to do encryption and decryption very frequently on the server, you can see us approaching 100,r running queue reached 80 (the machine is doing a stress test, poor performance).

sy System CPU time, if too high, indicates a long system call time, for example, the IO operation is frequent.

ID Idle CPU time, in general, ID + US + sy = 100, generally I think ID is idle CPU usage, US is the user CPU usage, SY is the system CPU utilization.

wt waits for IO CPU time.

2. Network

(1) Ethtool: Check whether the network card is working properly, whether full duplex, the speed setting is reasonable and so on.

(2) SAR: Compare the performance of each network card

# sar-n DEV 3 2Linux 2.6.32-431.23.3.el6.x86_64 (slave1) 03/13/2015 _x86_64_ (1 CPU) 08:41:11 PM IFA CE rxpck/s txpck/s rxkb/s txkb/s rxcmp/s txcmp/s rxmcst/s08:41:14 PM lo 0.00 0.00 0.0       0 0.00 0.00 0.00 0.0008:41:14 PM eth0 11.71 10.70 1.05 4.47 0.00 0.00    0.0008:41:14 pm eth1 144.48 0.00 5.93 0.00 0.00 0.00 0.0008:41:14 PM IFACE      rxpck/s txpck/s rxkb/s txkb/s rxcmp/s txcmp/s rxmcst/s08:41:17 PM lo 28.04 28.04 3.90      3.90 0.00 0.00 0.0008:41:17 PM eth0 183.45 3765.20 27.06 905.80 0.00 0.00   0.0008:41:17 PM eth1 179.05 31.76 7.48 70.62 0.00 0.00 0.00average:iface      rxpck/s txpck/s rxkb/s txkb/s rxcmp/s txcmp/s rxmcst/saverage:lo 13.95 13.95 1.94 1.94 0.00 0.0.00average:eth0 97.14 1878.49 13.99 452.86 0.00 0.00 0.00average:e Th1 161.68 15.80 6.70 35.13 0.00 0.00 0.00

(3) Iperf: Check the network bandwidth between 2 machines

One of them acts as a server:

# iperf-s------------------------------------------------------------Server listening on TCP port 5001TCP window Size: 85.3 KByte (default)------------------------------------------------------------[  4] Local 10.171.94.155 port 5001 connected with 10.171.29.191 Port 46455------------------------------------------------------------Client Connecting to 10.171.29.191, TCP port 5001TCP window Size:  143 KByte (default)------------------------------------- -----------------------[  6] Local 10.171.94.155 port 52215 connected with 10.171.29.191 Port 5001[ID] Interval       T Ransfer     bandwidth[  6]  0.0-10.0 sec   664 MBytes   557 mbits/sec[  4]  0.0-10.0 sec   466 MBytes   390 Mbits/sec

The other one acts as a client:

# iperf-c 10.171.94.155-f m-d------------------------------------------------------------Server listening on TCP port 5001TCP window size:0.08 mbyte (default)-------------------------------------------------------------------------- ----------------------------------------------Client connecting to 10.171.94.155, TCP Port 5001TCP window size:0.10 MByte (default)------------------------------------------------------------[  4] Local 10.171.29.191 port 46455 Connected with 10.171.94.155 Port 5001[  5] Local 10.171.29.191 ports 5001 connected with 10.171.94.155 Port 52215[ID] Interval       Transfer     bandwidth[  4]  0.0-10.0 sec   466 MBytes 390 mbits/sec[  5]  0.0-10.0 sec   664 MBytes   555 Mbits/sec

(4) Tcpdump: Check the transmission of packets

# tcpdump Port 8649tcpdump:verbose output suppressed, use-v OR-VV for full protocol decodelistening on eth0, Link-type EN10MB (Ethernet), capture size 65535 bytes20:43:11.396729 IP master.38498 > slave1.8649:udp, Length 13620:43:11.39674 6 IP master.38498 > slave1.8649:udp, Length 6420:43:11.397101 IP master.38498 > slave1.8649:udp, Length 13620:43:1 1.397105 IP master.38498 > slave1.8649:udp, Length 6420:43:11.397107 IP master.38498 > slave1.8649:udp, Length 136 20:43:11.397108 IP master.38498 > slave1.8649:udp, Length 8020:43:11.397109 IP master.38498 > slave1.8649:udp, len Gth 6420:43:11.397110 IP master.38498 > slave1.8649:udp, Length 14420:43:11.397111 IP master.38498 > Slave1.8649:u DP, Length 6820:43:11.397112 IP master.38498 > slave1.8649:udp, Length 15620:43:11.397114 IP master.38498 > slave1. 8649:UDP, Length 18820:43:11.397115 IP master.38498 > slave1.8649:udp, Length 9220:43:11.397116 IP master.38498 > SLAVE1.8649:UDP, length 88

You can also use parameters such as host.

3. Disk Health status

(1) Iostart

# Iostatlinux 2.6.32-431.17.1.el6.x86_64 (Master) 03/13/2015 _x86_64_ (1 CPU) avg-cpu:%user%nice%sy Stem%iowait%steal%idle 4.73 0.00 3.07 33.99 0.00 58.22device:tps blk_read/s B LK_WRTN/S blk_read Blk_wrtnxvda 206.31 292.48 2301.51 176511452 1388963336xvdb 10.2 4 69.64 645.59 42029194 389614304[[email protected] ~]# [[email protected] ~]# [[Email prot          Ected] ~]# iostat-m-x-d 5Linux 2.6.32-431.17.1.el6.x86_64 (Master) 03/13/2015 _x86_64_ (1 CPU) Device:   rrqm/s wrqm/s r/s w/s rmb/s wmb/s avgrq-sz avgqu-sz await SVCTM%utilxvda 0.16    115.94 34.57 171.74 0.14 1.12 12.57 4.92 23.83 1.87 38.49xvdb 0.00 71.55 1.09    9.14 0.03 0.32 69.88 0.89 87.38 0.89 0.91device:rrqm/s wrqm/s r/s w/s rmb/s WMB/S Avgrq-sz Avgqu-sZ await SVCTM%utilxvda 0.00 0.00 0.00 0.81 0.00 0.00 8.00 0.00 4.75 1.75 0.14XVDB 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

(2) DMESG

To output detailed error messages, the common information is as follows:

type=1400 Audit (1425645012.364:7): AVC:  denied  {setattr} for  pid=1318 comm= "RRDtool" name= "Fontconfig" DEV=XVDA1 ino=1049117 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:fonts_cache_t:s0 tclass =dir

1, too many files in the input file, causing the JVM process to start and stop multiple times. The

can set up JVM reuse. or data preprocessing, merging small files.

2, data skew: Large files and non-divisible, resulting in a map that processes these files takes a long time. The

avoids this situation as much as possible and can be resolved by preprocessing.

3, the data localization effect is poor. The

distributes the data evenly across the cluster:

start-balancer.sh

4, check if the amount of data is suddenly exploding in one day

Three, reduce end tuning

1, do not use reduce
Map end after the transfer of results to reduce the need to go through the shuffle phase and sequencing, and transmission of data over the network, this process caused large losses, therefore, The reduce task can be set to 0 if certain conditions are met.
Job.setnumreducetasks (0);
Note that the number of definitions that must be displayed is 0, otherwise, by default, there will be a reduce task, the class is reduce, and this class will input kv directly as output kv.

2, filtering, and projection
If you must have the reduce process, the next step is to minimize the amount of data that the map outputs, on the one hand, to reduce the network transmission data, on the other hand, reduce the data to be processed by the reduce. There are 2 common ways to reduce the amount of output data for a map:
(1) Filtering: Removes entire records that have no effect on the final result.
(2) Projection: Deletes items in the record that have no effect on the final result.

3, using combiner
using combiners, you can combine data in the map phase to reduce the transfer of data to reduce.

4, optimizing the comparer
This method can improve the data sort order

5, reduce skew data
because a key in the map's output corresponds to a large number of value values, This causes the reduce task to process this key to be much more time consuming than other reduce.

6, reduce the number of adjustments
Adjust the number of reduce to a little less than the total set of guest reduce slots, which ensures that the performance of the cluster is fully utilized, and that some machine errors can be tolerated.

Iv. Poor configuration of Hadoop
Fully understand the configuration files for Hadoop and choose the best configuration for your cluster.
Reference: http://blog.csdn.net/jediael_lu/article/details/38680013

V. Java tuning
Reference:??

One of the Hadoop tuning: overview

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More