Hadoop pseudo-Distribution mode configuration deployment

Last Update:2018-07-20 Source: Internet

Author: User

Tags deprecated xsl port number

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

second, Hadoop pseudo-distributed mode configuration

The experiment needs to proceed after the previous stand-alone mode deployment 1. Configure Core-site.xml, Hdfs-site.xml,mapred-site.xml and Yarn-site.xml 1). Modify Core-site.xml:

$ sudo gvim/usr/local/hadoop/etc/core-site.xml

<?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!--Put Site-specific property overrides the this file. -<configuration> <property> <name>fs.default.name</name> <value>h dfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name&
        Gt <value>/home/hadoop/tmp</value> </property> </configuration>

Common Configuration Item Description: Fs.default.name This is a URI that describes the Namenode node in the cluster (including protocol, host name, port number), and each machine in the cluster needs to know the address of the Namenode. Datanode nodes are first registered on the Namenode so that their data can be used. The standalone client program interacts with Datanode through this URI to get a list of blocks of files. Hadoop.tmp.dir is the underlying configuration on which the Hadoop file system relies, and many paths depend on it. If the Namenode and Datanode locations are not configured in Hdfs-site.xml, the default is placed in the/tmp/hadoop-${user.name} path

For more instructions, refer to Core-default.xml, which contains descriptions and default values for all configuration items in the configuration file. 2). Modify Hdfs-site.xml:

$ sudo gvim/usr/local/hadoop/etc/hdfs-site.xml

<?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <!--Put Site-specific property overrides the this file. -<configuration> <property> <name>dfs.replication</name> <value>1 </value> </property> </configuration>

Common Configuration Item Description: Dfs.replication It determines the number of data backups of the file blocks within the system. For an actual application, it should be set to 3 (this number has no upper limit, but more backups may not work and will take up more space). Less than three backups may affect the reliability of the data (when a system failure may result in data loss) Dfs.data.dir This is the local file system path where the Datanode node is specified to store the data. This path on the Datanode node is not necessarily identical because the environment of each machine is likely to be different. But if this path on each machine is uniformly configured, it makes the job easier. By default, its value is file://${hadoop.tmp.dir}/dfs/data this path can only be used for testing purposes, because it is likely to lose some data. So this value is best to be overwritten. Dfs.name.dir This is the local system path where the Namenode node stores Hadoop file system information. This value is only valid for Namenode, and Datanode does not need to be used to it. The above warning for the/temp type also applies here. In practical applications, it is best to be covered out.

For more instructions, refer to Hdfs-default.xml, which contains descriptions and default values for all configuration items in the configuration file. 3). Modify Mapred-site.xml:

$ sudo gvim/usr/local/hadoop/etc/mapred-site.xml

<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

<!--Put Site-specific property overrides the this file. -

<configuration>
    <property>
        <name>mapredurce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Common Configuration Item Description: Mapred.job.trackerJobTracker host (or IP) and port.

For more information, refer to Mapred-default.xml, which contains a description of all configuration items and a default value of 4). Modify Yarn-site.xml:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        < value>mapreduce_shuffle</value>
    </property>
    <property>
        <name> Yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value> org.apache.hadoop.mapred.shufflehandler</value>
    </property>
</configuration>

Common Configuration Item Description: Yarn.nodemanager.aux-services With this configuration, users can customize some services

For more instructions, refer to Yarn-default.xml, which contains descriptions and default values for all configuration items in the configuration file.

Such a simple pseudo-distributed mode is configured three, the format HDFs file system

Before using Hadoop, you must format a completely new HDFS installation by creating an initial version of the storage directory and Namenode persisted data structure, which creates an empty file system. Because Namenode manages the metadata of the filesystem, and Datanode can dynamically join or leave the cluster, the formatting process does not involve datanode. Similarly, users do not have to focus on the size of the file system. The number of Datanode in a cluster determines the size of the file system. Datanode can be added on demand for a long time after the file system is formatted. 1. First switch to the Hadoop account and follow the prompts to enter the account password

$ su Hadoop

2. Format the HDFs file system

$ sudo hadoop  namenode  -format

The table formats HDFs successfully by outputting the following information:

Deprecated:use of this script to the Execute HDFS command is DEPRECATED.
Instead Use the HDFs command for it.

INFO Namenode. Namenode:startup_msg:
/************************************************************
STARTUP_MSG: Starting NameNode
startup_msg:   host = [your hostname]/127.0.0.1
startup_msg:   args = [-format]
Startup_ MSG:   Version = 2.4.1 ...
...
INFO util. Exitutil:exiting with status 0
INFO namenode. Namenode:shutdown_msg:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at [your host name]/127.0.0.1
************************************************************/

iv. Hadoop cluster startup 1. Start the HDFs daemon to start Namenode and Datanode respectively

$ hadoop-daemon.sh Start Namenode
$ hadoop-daemon.sh Start Datanode

or one boot.

$ start-dfs.sh

The output is as follows (you can see that the Namenode, Datanode, Secondarynamenode are started separately because we have not configured Secondarynamenode, so the address is 0.0.0.0):

Starting namenodes on [] hadoop@localhost ' s password:localhost:starting namenode, logging To/usr/local/hadoop/logs/had Oop-hadoop-namenode-g470.out Hadoop@localhost ' s password:localhost:starting datanode, logging to/usr/local/hadoop/ Logs/hadoop-hadoop-datanode-g470.out localhost:openjdk 64-bit Server VM warning:you has loaded Library/usr/local/hado op/lib/native/libhadoop.so.1.0.0 which might has disabled stack guard.
The VM would try to fix the stack guard now. Localhost:it ' s highly recommended that you fix the library with ' execstack-c <libfile> ', or link It with '-Z noexe
Cstack '. Starting secondary namenodes [0.0.0.0] hadoop@0.0.0.0 ' s password:0.0.0.0:starting secondarynamenode, logging To/usr/lo Cal/hadoop/logs/hadoop-hadoop-secondarynamenode-g470.out

2. Start yarn and use the following command to restart ResourceManager and NodeManager:

$ yarn-daemon.sh Start ResourceManager
$ yarn-daemon.sh Start NodeManager

or one boot:

$ start-yarn.sh

3. Check if the operation is successful

Open Browser input: http://localhost:8088 into Resourcemanage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More