The analysis optimizes Hadoop from the Administrator's perspective, and the Administrator is responsible for providing an efficient running environment for user jobs. The administrator needs to adjust some key parameter values globally to improve the system throughput and performance. In general, administrators need to provide Hadoop users with an efficient job running environment from four aspects: hardware selection, operating system parameter optimization, JVM parameter optimization, and Hadoop parameter optimization. 1. Hardware Selection the basic features of Hadoop's own architecture determine the selection of its hardware configuration. Hadoop adopts the master/slave architecture. Among them, the master (JobTracker or NameNode) maintains the global metadata information, and the importance is far greater than that of the slave (TaskTracker or DataNode ). In earlier versions of Hadoop, the master node has single point of failure (spof). Therefore, the master node configuration should be far better than each slave (TaskTracker or DataNode ), for details, refer to Eric Sammer's Hadoop Operations book. 2. Operating System Parameter Optimization Due to some features of Hadoop, it is only suitable for using Linux as the production environment of the operating system. In practical application scenarios, the Administrator optimizes Linux kernel parameters to improve the job running efficiency. The following are some useful adjustment options. (1) Increase the file descriptor and network connection limit opened at the same time. In a Hadoop cluster, due to the large number of jobs and tasks involved, the operating system kernel limits the number of file descriptors and network connections for a node, A large number of file read/write operations and network connections may cause job operation failure. Therefore, when the Administrator starts the Hadoop cluster, use the ulimit command to increase the maximum number of file descriptors allowed to be opened at the same time to a proper value, and adjust the Kernel Parameter net. core. somaxconn to a value that is large enough. In addition, Hadoop RPC uses epoll as the high-concurrency library. If the Linux kernel version you are using is later than 2.6.28, you need to adjust the file descriptor ceiling of epoll as appropriate. (2) Disable swap Partition In Linux, if a process has insufficient memory space, it will temporarily write some data in the memory to the disk, then dynamically Replace the data on the disk into the memory. Generally, this behavior will greatly reduce the execution efficiency of the process. In the MapReduce distributed computing environment, you can completely avoid using swap partitions by controlling the amount of data processed by each job and the various buffer sizes used during the running of each task. The specific method is to adjust the vm. swappiness parameter in the/etc/sysctl. conf file. Settings:vm.swappiness=0 Usesysctl vm.swappiness Command to view the settings, or view/proc/sys/vm/swappiness. (3) set reasonable pre-read buffer size The development of disk I/O performance lags far behind the development of CPU and memory, thus becoming a major bottleneck of modern computer systems. Pre-reading can effectively reduce the number of disk seek times and the application I/O wait time. It is one of the important optimization methods to improve the disk read I/O performance. The administrator can use the Linux Command blockdev to set the size of the pre-read buffer to Improve the Performance of large file sequential reads in Hadoop. Of course, you can also increase the size of the pre-read buffer only for the Hadoop system itself. (4) File System Selection and Configuration The I/O performance of Hadoop depends largely on the read/write performance of the Linux local file system. Linux has a variety of file systems to choose from, such as ext3 and ext4. Different file systems may vary. If the company has a more efficient file system independently developed, it is also encouraged to use it. In a Linux file system, when the noatime attribute is not enabled, each file read operation triggers an additional file write operation to record the last file access time. This log operation can be avoided by adding it to the mount attribute. Setting method: Run the following command to mount a job:mount -o noatime -o nodiratime ?/dev/sda1 /data1 You can also modify fstab. /dev/sda1 /data1 ext3 ?defaults,noatime,nodiratime 0 0
Run againmount -o remount /data1 Run the stat command to check whether the command takes effect. Check whether the Access time of the file is changed as the file is read. (5) 110 scheduler Selection The mainstream Linux release comes with many available I/O schedulers. In data-intensive applications, the performance of different I/O schedulers varies greatly. administrators can enable the most suitable I/O schedulers based on their application characteristics, for more information, see the AMD White Paper Hadoop Performance Tuning Guide. In addition to the preceding common Linux kernel optimization methods, the administrator can adjust the methods as needed. 3. JVM Parameter Optimization Since each service and task in Hadoop runs in a separate JVM, some important JVM parameters also affect Hadoop performance. Administrators can adjust the jvm flags and JVM garbage collection mechanisms to improve Hadoop Performance. For details, see the AMD White Paper Hadoop Performance Tuning Guide. 4. Hadoop Parameter Optimization 1) reasonably plan resources (1) set a reasonable number of slots In Hadoop, computing resources are represented by slots. There are two types of slots: Map slot and Reduce slot. Each slot represents a certain amount of resources, and the same slot (such as Map slot) is homogeneous, that is, the same slot represents the same amount of resources. The administrator needs to configure a certain number of Map slots and Reduce slots for TaskTracker as needed to limit the number of Map tasks and Reduce tasks executed concurrently on each TaskTracker. The number of slots is configured in the mapred-site.xml on each TaskTracker, as shown in table 9-1. Table 9-1 Set the number of slots
Hadoop version |
Configuration parameters |
Default Value |
0.20.X (including 1.X), CDH 3 |
Mapred. tasktracker. map. tasks. maximum Mapred. tasktracker. reduce. tasks. maxin, um |
2 (The two parameters have the same value) |
0.21.X, 0.22.X |
Mapreduce. tasktracker. map. tasks. maximum Mapreduce. tasktracker. reduce. tasks. maximum |
2 (The two are the same) |
(2) Compile a health monitoring script
Hadoop allows administrators to configure a node health monitoring script @ for each TaskTracker @. TaskTracker contains a dedicated thread that periodically executes the script and reports the script execution result to JobTracker through the heartbeat mechanism. Once JobTracker finds that the current status of a TaskTracker is "unhealthy" (for example, the memory or CPU usage is too high), it will be blacklisted, no new tasks will be assigned to the script (the currently running tasks will still be executed normally) until the execution result of the script is displayed as "healthy ". How to compile and configure the health monitoring script. Note that this mechanism is available only in Versions later than Hadoop 0.20.2. 2) Adjust the heartbeat configuration (1) Adjust the heartbeat Interval The heartbeat interval between TaskTracker and JobTracker should be moderate. If it is too small, JobTracker needs to process high-concurrency heartbeat information, which will inevitably cause a lot of pressure. If it is too large, idle resources cannot be notified to JobTracker in time (and JJ points to their new ly. j. For Hadoop clusters with medium size JJ and smaller than 300 nodes, shortening the heartbeat interval between TaskTracker and JobTracker can significantly improve the system throughput. In Hadoop l.0 and earlier versions, when the node cluster is smaller than 300 nodes, the heartbeat interval is three seconds (cannot be modified ). This means that if your cluster has 10 nodes, JobTracker only needs to process 3.3 nodes per second on average. (10/3 = 3.3) heartbeat requests. If your cluster has 100 nodes, JobTracker only processes 33 heartbeat requests per second on average. For a common server, such load is too low and the server resources are not fully utilized. To sum up, for small and medium-sized Hadoop clusters, the heartbeat interval of 3 seconds is too high. The administrator can appropriately reduce the heartbeat interval @ as needed. The specific configuration is shown in Table 9-2. Table 9-2 sets heartbeat Interval
Hadoop version |
Configuration parameters |
Default Value |
0.20.X, 0.21.X, 0.22.X |
Not configurable |
When the cluster size is less than 300, the heartbeat interval is 3 seconds. After 100 nodes are added, the heartbeat interval is increased by 1 second. |
1. X, CDH 3 |
Mapreduce. jobtracker. heartbeat. interval. min Mapred. heartbeats. in. second Mapreduce. jobtracker. heartbeats. scaling. factor |
When the cluster size is less than 300, the heartbeat interval is 300 milliseconds (For details, refer to Section 6.3.2) |
(2) Enable out-of-band heartbeat Generally, the heartbeat is sent to JobTracker at a fixed interval. The heartbeat includes node resource usage, task running status, and other information. The heartbeat mechanism is a typical pull-based model. TaskTracker periodically reports information to JobTracker through heartbeat and obtains the newly assigned task. This model causes a large delay in the task allocation process: When TaskTracker has idle resources, it can only use the next heartbeat (for clusters of different sizes, the heartbeat interval is different, for example, for a cluster with currency points, the heartbeat interval is 10 seconds). JobTracker cannot be notified immediately. To reduce task allocation latency, Hadoop introduces out-of-band heartbeat e. The out-of-band heartbeat is different from the conventional heartbeat. It is triggered when the task is completed or failed. It can notify JobTracker immediately when idle resources exist, so that it can quickly allocate new tasks to idle resources. Table 9-3 shows how to configure the out-of-band heartbeat. Table 9-3 configure out-of-band heartbeat
Hadoop version |
Configuration parameters |
Description |
Default Value |
0.20.2 |
This mechanism is not introduced |
- |
- |
0.20.X (except 0.20.2), 0.21.X, 0.22.X, CDH 3 |
Mapreduce. tasktracker. Outofband. heartbeat |
Enable out-of-band heartbeat? |
False |
3) disk block configuration The intermediate results of the Map Task must be written to the local disk. For I/O-intensive tasks, this part of data will put a lot of pressure on the local disk, the administrator can reduce the write pressure by configuring multiple disks. When multiple available disks exist, Hadoop writes the intermediate results of different Map tasks to these disks in polling mode to share the load, as shown in table 9-4. Table 9-4 configure multiple disk blocks
Hadoop version |
Configuration parameters |
Default Value |
0.20.X (including 1.X), CDH 3 |
Mapred. local. dir |
/Tmp/hadoop-$ {user. name}/mapred/local |
0.21.X, 0.22.X |
Mapreduce. cluster. local. dir |
/Tmp/hadoop-$ {user. name}/mapred/local |
4) set reasonable RPC Handler and HTTP thread count (1) configure the number of RPC Handler JobTracker needs to concurrently process RPC requests from TaskTracker. The administrator can adjust the number of RPC Handler Based on the cluster size and server concurrency to optimize the JobTracker service capabilities. The configuration method is shown in Table 9-5. Table 9-5 configure the number of RPC Handler
Hadoop version |
Configuration parameters |
Default Value |
0.20.X (including 1X), CDH 3 |
Mapred. job. tracker. handler. count |
10 |
0.21.X, 0.22.X |
Mapreduce. jobtracker. handler. count |
10 |
(2) configure the number of HTTP threads In the Shuffle stage, Reduce tasks read the intermediate results of Map tasks from each TaskTracker through HTTP requests, and each TaskTracker processes these HTTP requests through Jetty Server. The administrator can adjust the number of working threads of the Jetty Server to improve the concurrent processing capability of the Jetty Server, as shown in table 9-6. Table 9-6 configure the number of HTTP threads
Hadoop version |
Configuration parameters |
Default Value |
0.20.x (including 1.X), CDH 3 |
Tasktracker. http. threads |
40 |
0.21.X, 0.22.X |
Mapreduce. tasktracker. http. threads |
40 |
5) use the blacklist mechanism with caution When a job stops running, it counts the number of failed tasks on each TaskTracker. If the number of failed tasks of a TaskTracker exceeds a certain value, the job adds the task to its own blacklist. If a TaskTracker is blacklisted by a certain number of jobs, JobTracker adds the TaskTracker to the system blacklist. After that, JobTracker no longer assigns new tasks to it, until there are no failed tasks in a certain period of time. When a Hadoop cluster is small, if a certain number of nodes are frequently added to the system blacklist, the cluster throughput and computing capability will be greatly reduced. Therefore, we recommend that you disable this function, for detailed configuration methods, see section 6.5.2. 6) Enable batch Task Scheduling In Hadoop, the scheduler is one of the core components. It is responsible for allocating idle resources in the system to various tasks. Currently, Hadoop provides multiple schedulers, including the default FIFO Scheduler, Fair Scheduler, and Capacity Scheduler. The scheduling efficiency of the Scheduler directly determines the throughput of the system. Generally, to allocate idle resources to tasks as much as possible, the Hadoop scheduler supports batch task scheduling e, that is, allocating all idle tasks at a time instead of assigning only one at a time, the specific configuration is shown in Table 9.7 (the FIFO scheduler itself is a batch Scheduler ). Table 9-7 configure batch Task Scheduling
Scheduler name |
Hadoop version |
Configuration parameters |
Parameter description |
Default Value |
0.20.2, 0.21.X, 0.22.X |
- |
- |
Batch scheduling is not supported. One task is assigned at a time. |
Capacity Scheduler |
0.20.X (including I. x ), CDH 3 |
Mapred. capacity-scheduler.maximum-tasks-per- Heartbeat |
The maximum number of tasks per heartbeat |
32 767 |
Before 0.20.205 |
- |
- |
Batch scheduling is not supported. One task is assigned at a time. |
Fair Scheduler |
0.21.X, 0.22.X |
Mapred. fairschedple. assignrnultiple Mapred. fairschedple. assignmultiple. Maps Mapred. fairschedple. assignmultiple. Reduces |
Whether to enable the batch scheduling function. If yes, the maximum number of Map tasks and Reduce tasks can be allocated at a time. |
Enable batch scheduling and allocate a Map at a time The maximum number of tasks and Reduce tasks is not limited. |
7) select an appropriate Compression Algorithm Hadoop is usually used to process I/O-intensive applications. For such an application, Map Task will output a large amount of intermediate data, and the read/write of the data is transparent to users. If it supports intermediate data compression and storage, it will significantly improve the system's I/O performance. When selecting a compression algorithm, the compression ratio and compression efficiency must be considered. Some compression algorithms have a good compression ratio, but the compression/Decompression efficiency is very low; on the contrary, some algorithms have a high compression/Decompression efficiency, but the compression ratio is very low. Therefore, an excellent compression algorithm needs to balance the compression ratio and compression efficiency. Currently, there are multiple optional compression formats, such as gzip, zip, bzip2, LZO e, and Snappy @. Among them, LZO and Snappy are superior in both compression ratio and compression efficiency. Among them, Snappy is Google's open-source data compression library, and its encoding/decoder has been built into Versions later than Hadoop l.0 @; LZO is different, it is based on GPL license, the license cannot be distributed through Apache. Therefore, its Hadoop encoding/decoder must be downloaded separately. The following uses Snappy as an example to describe how to make Hadoop compress the data results in the middle of the Map Task (in the mapred- Configuration in site. xml): mapred. compresg. map. outputtruemapred. map. output. compression. codec Org. apache. hadoop. iQ. compress. SnappyCodec </value> "Mapred. compress. map. output" indicates whether to compress the output results in the middle of the Map Task, and "mapred. map. output. compression. codec" indicates the encoding/decoder used. Table 9-8 shows whether the Snappy compression algorithm is built in Hadoop versions. Table 9-8 configure the Snappy Compression Algorithm
Hadoop version |
Built-in Snappy? |
0.20.X (excluding 1.X), 0.21.X.0.22.X |
No |
1. X, CDH 3 |
Yes |
8) Enable the pre-read Mechanism As mentioned above, the pre-read mechanism can effectively improve the I/O read performance of the disk. As Hadoop is a typical sequential read system, the pre-read mechanism can significantly improve HDFS read performance and MapReduce job execution efficiency. The administrator can enable the pre-read function for MapReduce data copying and IFile file reading, as shown in table 9-9. Table 9-9 pre-read Configuration
Hadoop version |
Configuration parameters |
Including Yi |
Default Value |
Apache versions and CDH 3 u3 or earlier |
This mechanism is not introduced yet |
- |
- |
Mapred. tasktracker. shuffle. fadvise |
Whether to enable the Shuffle pre-read Mechanism |
True |
CDH 3 u3 and higher |
Mapred. tasktracker. shuffle. readahead. bytes |
Shuffle pre-read buffer size |
4 MB |
Version |
Mapreduce. ifile. readahead |
Whether to enable the IFile pre-read Mechanism |
True |
Mapreduce. ifile. readahead. bytes |
IFile pre-read buffer size |
4 MB |
|