Chapter 9-Build a hadoop Cluster

Source: Internet
Author: User
System log file 1. We recommend that you modify the default settings of the hadoop System Log File to make it independent of the hadoop installation directory.

By default, the system log files generated by hadoop are stored in the $ hadoop_install/logs directory, and can also be modified through hadoop_log_dir in the hadoop-env.sh file.

2. Why is it recommended to modify the hadoop System Log File storage directory independent of the hadoop installation directory?

Modify the default configuration to make it independent from the hadoop installation directory. In this way, even if the installation path changes after the hadoop upgrade, the location of the log file will not be affected.

Generally, log files can be stored in the/var/log/hadoop directory.

Implementation Method: Add the following line to the hadoop-env.sh, export hadoop_log_dir =/var/log/hadoop

3. Each hadoop daemon generates two types of log files. 1) Use the. log as the suffix name recorded through log4j.

Because most application log messages are written to this log file, you need to view this file before troubleshooting the problem.

The standard hadoop log4j configuration uses the daily rolling file suffix Policy (daily rolling file appender) to name log files.

The system does not automatically delete expired log files. Instead, it is reserved for Regular deletion or archiving to save local disk space.

2) record the standard output and standard error logs-the log file suffix is. Out

Because hadoop uses log4j to record logs, this file usually contains only a small number of records, or even is empty.

When the daemon is restarted, the system creates a new file to record such logs. The system only keeps the latest 5 log files. The old log file will be appended with a digital suffix. The value ranges from 1 to 5, and 5 indicates the oldest file.

The user name section in the log file name actually corresponds to the hadoop_ident_string item in the hadoop-env.sh file.

SSH settings 1. With rsync, The hadoop control script can distribute configuration files to all nodes in the cluster.

By default, this function is not enabled.

To enable this feature, you need to define the hadoop_mester item in the hadoop-env.sh file.

After the working node daemon is started, the directory tree with hadoop_master as the root will be synchronized with the local hadoop_isntall directory (??? I don't understand how to use it ?!)

2. How do I ensure that hadoop_master has been configured for the hadoop-env.sh file for each worker node?

It is easy to solve for small clusters:

Write a script file that copies the hadoop-env.sh file of the master node to all worker nodes.

For large clusters:

You can use a tool similar to dsh to copy files in parallel.

In addition, you can write a hadoop-env.sh file, add the above content, and use this file as part of the automatic installation script

3. In a large cluster, all nodes in the cluster send rsync requests to the master node at the same time, which will paralyze the master node. How can this problem be avoided?

To avoid this, set the hadoop_slave_sleep item to a short period of time, for example, 0.1 (indicating 0.1 seconds ).

In this way, the master node will take the initiative to sleep for a period of time between the commands that call two working nodes one after another.







Chapter 9-Build a hadoop Cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.