Hadoop2.4.1 cluster configuration on Ubuntu14.04

Source: Internet
Author: User
Tags hdfs dfs hadoop fs
This article mainly reference: small 3. gxl-ct001.iteye.comblog19829104.w

This article mainly references: 1. http://blog.csdn.net/ab198604/article/details/8250461 (prefer this author popular writing style, haha, so the whole piece of space paste his content) 2. http:// OS .51cto.com/art/201309/411793_all.htm 3. http://gxl-ct001.iteye.com/blog/1982910 4. http: // w

The main reference of this article: 1. http://blog.csdn.net/ab198604/article/details/8250461 (prefer this author's popular writing style, haha, so the whole length of a large number of paste his content)
2. http:// OS .51cto.com/art/201309/411793_all.htm
3. http://gxl-ct001.iteye.com/blog/1982910
4. http://www.cnblogs.com/tippoint/archive/2012/10/23/2735532.html
5. http://www.cnblogs.com/lanxuezaipiao/p/3525554.html
6. http://blog.csdn.net/skywalker_only/article/details/37905463
7. http://chj738871937.iteye.com/blog/2088735
Http://blog.chinaunix.net/uid-20682147-id-4229024.html#_Toc807
9. http://ca.xcl0ud.net/wp-content/uploads/2014/05/Hadoop-2.pdf

Directory:
I. Introduction
2. Preparations
3. Configure the hosts file
4. Create a hadoop Running Account
5. Configure ssh password-free connection
6. Download and decompress the hadoop installation package
7. Configure namenode and modify the site file
8. Configure the hadoop-env.sh File
9. Configure the slaves File
10. Copy hadoop to each node
11. Format namenode
12. Start HDFS
13. Start YARN
14. view Cluster information through the website

I. Introduction
Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without understanding the details of the distributed underlying layer, and make full use of the power of clusters for high-speed computing and storage. Hadoop version: http://apache.communilink.net/hadoop/common/

Hadoop is an open-source distributed computing platform under the Apache Software Foundation. Hadoop, with Hadoop Distributed File System HDFS (Hadoop Distributed Filesystem) and MapReduce (open-source implementation of Google MapReduce) as the core, provides users with a Distributed infrastructure with transparent underlying system details.
Hadoop clusters can be divided into Master and Salve roles. An HDFS cluster is composed of one NameNode and several DataNode. NameNode acts as the master server to manage the file system namespace and client access to the file system; DataNode in the cluster manages the stored data. The MapReduce framework is composed of a single JobTracker running on the master node and a TaskTracker running on each slave node. The master node schedules all tasks of a job, which are distributed across different slave nodes. The master node monitors their execution and re-executes the previous failed tasks. The slave node is only responsible for the tasks assigned by the master node. When a Job is submitted, after JobTracker receives the submitted Job and configuration information, it will distribute the configuration information to the slave node, schedule the task, and monitor the execution of TaskTracker.
From the above introduction, HDFS and MapReduce constitute the core of the Hadoop distributed system architecture. HDFS implements a distributed file system on the cluster. MapReduce implements distributed computing and task processing on the cluster. HDFS provides support for file operations and storage during MapReduce task processing. Based on HDFS, MapReduce distributes, traces, and executes tasks and collects results, they interact with each other to complete the main tasks of Hadoop distributed clusters.

To thoroughly learn hadoop data analysis technology, the first task is to build a hadoop cluster environment, which can be simply imagined as a small software, by installing the software on each physical node and running it, it is a hadoop distributed cluster.
It's simple, but what should we do? Not in a hurry. The main purpose of this article is to allow new users to do so after reading it. Due to limited equipment, only virtual machines can be used to simulate the cluster environment. Although it is a virtual machine simulation, the hadoop cluster building process on the virtual machine can also be used in actual physical nodes, the idea is the same.
Some people may want to know what kind of computer configuration is required to install the hadoop cluster. This is only for the virtual machine environment. The following describes my situation:
CPU: Intel i5-3230M 2.6 Ghz
Memory: 6 GB
Hard Disk: 320 GB
System: Win7

After talking about the hardware configuration of the computer, let's talk about the prerequisites for installing hadoop.
Note: Because hadoop requires the same directory structure for hadoop deployment on all machines (because other task nodes are started in the same directory as the master node at startup ), and all have the same user name account. According to the various documents, all machines have a hadoop user. This account is used for password-free authentication. For convenience, a new hadoop user is created on each of the three machines. Note that most distributed programs have such requirements, as MPI described in the previous blog.

Ii. Preparations
2.1 install Vmware WorkStation Software
Some may ask why the software should be installed. This is a virtual machine working platform provided by a VM Company, and the linux operating system should be installed later on this platform.

2.2 install linux on a virtual machine
The linux operating system is installed on the basis of the previous step, because hadoop is generally running on the linux platform. Although there are also windows versions, the implementation on linux is relatively stable and not prone to errors, if you install a hadoop cluster in windows, it is estimated that the various problems encountered during the installation process will cause more crashes. In fact, I have not installed the cluster in windows ~
The linux operating system installed on the virtual machine is ubuntu14.04. This is the version of the system I installed. Why should I use this version? It is very simple because it is currently the latest version ..., in fact, you can use any linux system. For example, you can use centos, redhat, and fedora, which is no problem at all. The process of installing linux on a virtual machine is also skipped here. For details, see [install VMware + Ubuntu and create multiple cluster virtual machines].

2.3 prepare three virtual machine nodes
In fact, this step is very simple. If you have completed Step 1 and you have prepared the first virtual node, how can you prepare the second and third Virtual Machine nodes? You may have figured out that you can install the linux system two times in step 1 to implement the second and third Virtual Machine nodes respectively. However, this process may cause you to crash. In fact, there is another simpler method, namely cloning. That's right, it is the first virtual machine node you just installed, copies the entire system directory to form the second and third Virtual Machine nodes. Easy !~~
Many may ask what the two nodes are for. The principle is very simple. According to the basic requirements of the hadoop cluster, one of them is the master node, it is mainly used to run namenode, secondorynamenode, and jobtracker tasks in hadoop programs. Both external nodes are slave nodes. One of them is used for redundancy purposes. If there is no redundancy, it cannot be called hadoop. Therefore, a hadoop cluster must have at least three nodes, if the computer configuration is very high, you can consider adding some other nodes. The slave node runs the datanode and tasktracker tasks in the hadoop program.
Therefore, after preparing the three nodes, You need to rename the host names of the linux system respectively (because the previous operation is copying and pasting to generate the other two nodes, at this time, the host names of the three nodes are the same.) for the method of renaming the host names, see my previous blog post [install VMware + Ubuntu and create multiple cluster VMS].
Note: The host name cannot be underlined. Otherwise, the SecondaryNameNode node reports the following error at startup.

The following are my three-node ubuntu System host named: jacobxu-ubuntuNode001, jacobxu-ubuntuNode002, jacobxu-ubuntuNode003

2.4 configure the Java environment
Java environment Installation
JDK must be installed on all machines. Install JDK on the Master server first, and then repeat the steps on other servers. Install JDK and configure environment variables as "root.
2.4.1 install JDK
: Http://www.oracle.com/technetwork/java/javase/index.html
JDK: jdk-7u65-linux-x64.gz
First, log on to the "jacobxu-ubuntuNode001" with the root identity and create a zoojavakeeper folder under "/usr.pdf, and then copy mongojdk-7u65-linux-x64.gz" to the "/usr/java" folder, and then extract it. View the "/usr/environment" file and go to the next "Configure environment variables" step.
2.4.2 configure Environment Variables
(1) edit the "/etc/profile" File
Edit the "/etc/profile" file and add the "JAVA_HOME", "CLASSPATH", and "PATH" content of Java to the file below:
# Set java environment
Export JAVA_HOME =/usr/java/jdk1.7.0 _ 65/
Export JRE_HOME =/usr/java/jdk1.7.0 _ 65/jre
Export CLASSPATH =.: $ CLASSPATH: $ JAVA_HOME/lib: $ JRE_HOME/lib
Export PATH = $ PATH: $ JAVA_HOME/bin: $ JRE_HOME/bin

Or
# Set java environment
Export JAVA_HOME =/usr/java/jdk1.7.0 _ 65/
Export CLASSPATH =.: $ CLASSPATH: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib
Export PATH = $ PATH: $ JAVA_HOME/bin: $ JAVA_HOME/jre/bin

The preceding two methods share the same meaning, so we can set them in 1st.

(2) Make the configuration take effect
Save and exit. Execute the following command to make the configuration take effect immediately.
Source/etc/profile or./etc/profile

2.4.3 verify that the installation is successful
After the configuration is complete and takes effect, run the following command to determine whether the configuration is successful.
Java-version

2.4.4 install the remaining Machine
Go through the above process.

2.5 service port conventions:
Port Function
9000 fs. defaultFS, such as: hdfs: // 172.25.40.171: 9000
9001 dfs. namenode. rpc-address, DataNode will connect to this port
50070 dfs. namenode. http-address
50470 dfs. namenode. https-address
50100 dfs. namenode. backup. address
50105 dfs. namenode. backup. http-address
50090 dfs. namenode. secondary. http-address, for example, 172.25.39.166: 50090
50091 dfs. namenode. secondary. https-address, for example, 172.25.39.166: 50091
50020 dfs. datanode. ipc. address
50075 dfs. datanode. http. address
50475 dfs. datanode. https. address
50010 dfs. datanode. address, data transmission port of DataNode
8480 dfs. journalnode. rpc-address
8481 dfs. journalnode. https-address
8032 yarn. resourcemanager. address
8088 yarn. resourcemanager. webapp. address, http port of YARN
8090 yarn. resourcemanager. webapp. https. address
8030 yarn. resourcemanager. schedager. address
8031 yarn. resourcemanager. resource-tracker.address
8033 yarn. resourcemanager. admin. address
8042 yarn. nodemanager. webapp. address
8040 yarn. nodemanager. localizer. address
8188 yarn. timeline-service.webapp.address
10020 mapreduce. jobhistory. address
19888 mapreduce. jobhistory. webapp. address
2888 ZooKeeper. If it is a Leader, it is used to listen to the Follower connection.
3888 ZooKeeper for Leader Election
2181 ZooKeeper, used to listen to client connections
60010 hbase.master.info. port, http port of HMaster
60000 hbase. master. port, RPC port of HMaster
60030 hbase.regionserver.info. port, http port of HRegionServer
60020 hbase. regionserver. port, RPC port of HRegionServer
8080 hbase. rest. port, HBase REST server port
10000 hive. server2.thrift. port
9083 hive. metastore. uris

After the basic conditions are ready, you have to do things in the future. Please be patient. Just follow your own ideas and install the hadoop Cluster one step at a time.

3. Configure the hosts file(All three VMS are required.) For details, see [install Distributed Parallel library MPICH2] section 7.1;

4. Create a hadoop Running Account(All three VMS are required.) (skip this step first. We can use the current user group and users)
Specifically set up a user group and user for the hadoop cluster. This part is relatively simple. The example is as follows:
Sudo groupadd hadoop // sets the hadoop User Group
Sudo useradd-s/bin/bash-d/home/jacobxu-m jacobxu-g hadoop-G admin // Add a user named jacobxu, which belongs to the hadoop user group, and has the admin permission.
Sudo passwd jacbxu // set the user's jacbxu logon Password
Su jacobxu // switch to the user in the jacobxu

All the above three VM nodes need to perform the above steps to create a hadoop running account.

5. Configure ssh password-free connection(All three VMS are required.) For details, see [install Distributed Parallel library MPICH2] section 7.3;

6. Download and decompress the hadoop installation package(All three VMS are required)
About the installation package download is not much said, but you can mention the current version of my use for the hadoop-2.4.1,
This version is almost the latest, but it is not necessarily stable. You should try it first, and it is not urgent to use other versions after you become proficient. (Note: The hadoop authoritative guide book is also for the hadoop-0.20.2 version ). Put the compressed package in the/home/jacxu/hadoop/directory and decompress the package:
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop $ tar-xf hadoop-2.4.1.tar.gz
Note: After unzipping, The hadoop Software Directory is in/home/jacbxu/hadoop/hadoop-2.4.1.

7. Configure namenode and modify the site file
Next, configure the hadoop path to facilitate subsequent operations. The configuration process is mainly completed by modifying the/etc/profile file. Add the following two lines of code to the profile file:
Export HADOOP_HOME =/home/jacbxu/hadoop/hadoop-2.4.1
Export PATH = $ PATH; $ HADOOP_HOME/bin

Run: source/etc/profile
Make the configuration file take effect immediately. During the above configuration process, each node must be repeated.

So far, the preparation has been completed. Next, modify the hadoop configuration file, that is, various site files. The configuration file is placed in the $ HADOOP_HOME/etc/hadoop directory, for Hadoop 2.3.0 and Hadoop 2.4.0, The core-site.xml, yarn-site.xml, hdfs-site.xml, and mapred-site.xml under this directory are empty. If it is not configured, it starts up, such as running a start-dfs.sh, and you may encounter various errors.
You can copy a copy from the $ HADOOP_HOME/share/hadoop directory to the/etc/hadoop directory, and then modify it on this basis (the following content can be copied and executed directly, default in version 2.3.0. the xml file path is different from 2.4.0 ):
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop/hadoop-2.4.1/etc/hadoop $ cp.../share/doc/hadoop-project-dist/hadoop-common/core-default.xml./core-site.xml
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop/hadoop-2.4.1/etc/hadoop $ cp.../share/doc/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml./hdfs-site.xml
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop/hadoop-2.4.1/etc/hadoop $ cp.../share/doc/hadoop-yarn-common/yarn-default.xml./yarn-site.xml
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop/hadoop-2.4.1/etc/hadoop $ cp.../share/doc/hadoop-mapreduce-client-core/mapred-default.xml./mapred-site.xml

Next, you need to make the appropriate changes to the default core-site.xml, yarn-site.xml, hdfs-site.xml, and mapred-site.xml, otherwise the startup will still fail.

Core-site.xml configuration is as follows:
Attribute name? ? ? ? ? ? ? Attribute Value? ? ? ? ? ? ? ? Scope? ? ? ? ? ? ? ? ? Remarks
Fs. defaultFS? ? ? Hdfs: // 192.168.111.128: 9000? ? All nodes? ? ? The parameter in the Hadoop-1.x is fs. default. name
Hadoop. tmp. dir? ? ? /Home/jacbxu/hadoop/tmp-jacxu? ? ? ? All nodes

Note that you need to create the configured directory before starting. For example, create the/home/jacbxu/hadoop/tmp-jacxu directory.

The content configuration of the hdfs-site.xml is as follows:
Attribute name? ? ? ? ? ? Attribute Value? ? ? ? ? ? ? ?? Scope
Dfs. namenode. rpc-address ?? 192.168.111.128: 9001 ?? All nodes
Dfs. namenode. secondary. http-address? 192.168.111.129: 50090? NameNode ,? SecondaryNameNode (because there are not so many nodes, we will not configure it here)
Dfs. namenode. name. dir? /Home/jacbxu/hadoop/name-jacxu? NameNode ,? SecondaryNameNode
Dfs. datanode. data. dir? /Home/jacbxu/hadoop/data-jacxu ?? All DataNode
Dfs. replication 1

The content configuration of the mapred-site.xml is as follows:
Attribute name? ? ? ? ? ? ? Attribute Value? ? ? ? ? ? ? ? Scope
Mapreduce. framework. name ?? Yarn

The content configuration of the yarn-site.xml is as follows:
Attribute name? ? ? ? ? ? ? Attribute Value? ? ? ? ? ? ? ? Scope
Yarn. resourcemanager. hostname? ? 192.168.111.128? ? ResourceManager, NodeManager
Yarn. nodemanager. hostname? ? 0.0.0.0? ?? All nodemanagers

If yarn. nodemanager. hostname is configured as a specific IP address, for example, 10.12.154.79, the configuration of each NamoManager is different.

8. Configure the hadoop-env.sh File
Modify the $ HADOOP_HOME/etc/hadoop/hadoop-env.sh file on all nodes and add: export JAVA_HOME =/usr/java/jdk1.7.0 _ 65 near the file header
Note: Although JAVA_HOME has been added to/etc/profile, you still have to modify the hadoop-env.sh on all nodes; otherwise, an error is reported at startup.

Add the following content:
Export HADOOP_COMMON_LIB_NATIVE_DIR =$ {HADOOP_HOME}/lib/native
Export HADOOP_OPTS = "-Djava. library. path = $ HADOOP_HOME/lib"

9. Configure the slaves File
Modify the $ HADOOP_HOME/etc/hadoop/slaves file on NameNode and SecondaryNameNode, add the node IP address of slaves (or the corresponding host name) one by one, and one line of IP address, as shown below:
> Cat slaves
192.168.111.129
192.168.111.130

10. Copy hadoop to each node
Use scp to copy the content in the hadoop folder to each node;
Jacob bxu @ jacobxu-ubuntuNode001 :~ /Hadoop/hadoop-2.4.1/etc/hadoop $ scp * jacobxu@192.168.111.129:/home/jacbxu/hadoop/hadoop-2.4.1/etc/hadoop/

Note: Remember to create folders on other nodes at the same time.

11. Format namenode(This step is performed on the master node)
Before starting Hadoop, you must format Namenode first.
1) Go to the $ HADOOP_HOME/bin directory.
2) format:./hdfs namenode-format
If "INFO util. ExitUtil: Exiting with status 0" is output, the formatting is successful. So close:
14/07/21 13:32:34 INFO util. ExitUtil: Exiting with status 0
14/07/21 13:32:34 INFO namenode. NameNode: SHUTDOWN_MSG:
/*************************************** *********************
SHUTDOWN_MSG: Shutting down NameNode at jacobxu-ubuntuNode001/192.168.111.128
**************************************** ********************/

During formatting, if the ing between the host name and IP is not added to the/etc/hosts file: "172.25.40.171 VM-40-171-sles10-64", the following error is reported:
14/04/17 03:44:09 WARN net. DNS: Unable to determine local hostname-falling back to "localhost"
Java.net. UnknownHostException: VM-40-171-sles10-64: VM-40-171-sles10-64: unknown error
At java.net. InetAddress. getLocalHost (InetAddress. java: 1484)
At org.apache.hadoop.net. DNS. resolveLocalHostname (dn.java: 264)
At org.apache.hadoop.net. DNS. (DNS. java: 57)
At org. apache. hadoop. hdfs. server. namenode. NNStorage. newBlockPoolID (NNStorage. java: 945)
At org. apache. hadoop. hdfs. server. namenode. NNStorage. newNamespaceInfo (NNStorage. java: 573)
At org. apache. hadoop. hdfs. server. namenode. FSImage. format (FSImage. java: 144)
At org. apache. hadoop. hdfs. server. namenode. NameNode. format (NameNode. java: 845)
At org. apache. hadoop. hdfs. server. namenode. NameNode. createNameNode (NameNode. java: 1256)
At org. apache. hadoop. hdfs. server. namenode. NameNode. main (NameNode. java: 1370)
Caused by: java.net. UnknownHostException: VM-40-171-sles10-64: unknown error
At java.net. Inet4AddressImpl. lookupAllHostAddr (Native Method)
At java.net. InetAddress $2. lookupAllHostAddr (InetAddress. java: 907)
At java.net. InetAddress. getAddressesFromNameService (InetAddress. java: 1302)
At java.net. InetAddress. getLocalHost (InetAddress. java: 1479)
... 8 more

12. Start HDFS(This step is performed on the master node)
12.1 start HDFS.
1) Go to the $ HADOOP_HOME/sbin directory.
2) Start HDFS:./start-dfs.sh

If the following error occurs during startup, NameNode cannot log on to itself without a password. If you can log on to yourself using an IP address without a password, this is generally because you have not logged on to yourself using the host name. Therefore, you can use the host name SSH, for example, ssh hadoop @ VM_40_171_sles10_64, and then start it again.
Starting namenodes on [VM_40_171_sles10_64]
VM_40_171_sles10_64: Host key not found from database.
VM_40_171_sles10_64: Key fingerprint:
VM_40_171_sles10_64: xofiz-zilip-tokar-rupyb-tufer-tahyc-sibah-kyvuf-palik-hazyt-duxux
VM_40_171_sles10_64: You can get a public key's fingerprint by running
VM_40_171_sles10_64: % ssh-keygen-F publickey. pub
VM_40_171_sles10_64: on the keyfile.
VM_40_171_sles10_64: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument

Here there are some warnings, which are ignored for the time being (note, we will see them later ):
14/07/21 15:42:45 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.attempts; Ignoring.
14/07/21 15:42:45 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.retry.interval; Ignoring.
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.attempts; Ignoring.
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.retry.interval; Ignoring.
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.attempts; Ignoring.
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.retry.interval; Ignoring.
14/07/21 15:42:46 WARN util. NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.attempts; Ignoring.
14/07/21 15:42:46 WARN conf. Configuration: mapred-site.xml: an attempt to override final parameter: mapreduce. job. end-notification.max.retry.interval; Ignoring.
Found 1 items

12.2 use jps to check whether various background processes are successfully started
1) run the jps Command provided by JDK to check whether the corresponding process has been started.
2) Check the logs and out files in the $ HADOOP_HOME/logs directory to see if any exception information exists.
12.2.1. DataNode
Run the jps command to view the DataNode process:
$ Jps
18669 DataNode
Jps 24542
12.2.2. NameNode
Run the jps command to view the NameNode process:
$ Jps
18669 NameNode
Jps 24542
12.2.3. SecondaryNameNode
Run the jps command. You can see:
$ Jps
Jps 24542
3839 SecondaryNameNode

12.3 run the HDFS command
Run the HDFS command to check whether the installation is successful and the configuration is complete. For the usage of HDFS commands, you can directly run hdfs or hdfs dfs to see related usage instructions.
12.3.1. hdfs dfs ls
"Hdfs dfs-ls" includes a parameter. If the parameter is prefixed with "hdfs: // URI", it indicates access to HDFS. Otherwise, it is equivalent to ls. URI is the IP or host name of NameNode, which can contain the port number, that is, the value specified by dfs. namenode. rpc-address in the hdfs-site.xml.
"Hdfs dfs-ls" requires the default port to be 8020. If it is set to 9000, you must specify the port number. Otherwise, you do not need to specify the port, which is similar to accessing a URL in a browser. Example:
> Hdfs dfs-ls hdfs: // 192.168.111.128: 9001/

The slash/after 9001 is required. Otherwise, it is treated as a file. If port 9001 is not specified, the default 8020 is used, and "192.168.111.128: 9001" is specified by "dfs. namenode. rpc-address" in the hdfs-site.xml.
It is not hard to see that "hdfs dfs-ls" can operate on different HDFS clusters. You only need to specify different Uris.
After the file is uploaded, it is stored in the data directory of DataNode (specified by the property "dfs. DataNode. data. dir" in the hdfs-site.xml of datanode), such:
/Home/jacbxu/hadoop/data-jacoxu/current/BP-1086845186-192.168.111.128-1405920752636/current/finalized/blk_1_3741826
The "blk" in the file name indicates a block. By default, blk_rj3741825 indicates a complete block of the file, which is not processed by Hadoop.
12.3.2. hdfs dfs-put
Upload file command, for example:
> Hdfs dfs-put./data.txt hdfs: // 192.168.111.128: 9001/
12.3.3. hdfs dfs-rm
Command to delete a file, for example:
> Hdfs dfs-rm hdfs: // 192.168.111.128: 9001/data.txt
Deleted hdfs: // 192.168.111.128: 9001/SuSE-release

12.3.4 hadoop fs-ls can also be used directly without the suffix hdfs: // or something.
For basic operations on HDFS, see [http://supercharles888.blog.51cto.com/609344/876099].

13. Start YARN
(Do not start YARN for the moment)
14. view Cluster information through the website
Enter http: // 192.168.111.128: 50070/dfshealth.html # tab-datanode to view the data distribution and storage usage of the Datanode node, as shown below:
Datanode Information
In operation
Node? Last contact? Admin State? Capacity? Used? Non DFS Used? Remaining? Blocks? Block pool used? Failed Volumes? Version
Jacobxu-ubuntuNode003 (192.168.111.130: 50010) 0 In Service 17.59 GB 24 KB 6.12 GB 11.47 GB 0 24 KB (0%) 0 2.4.1
Jacobxu-ubuntuNode002 (192.168.111.129: 50010) 0 In Service 17.59 GB 352 KB 6.12 GB 11.47 GB 1 352 KB (0%) 0 2.4.1

Original article address: Hadoop 2.4.1 cluster configuration on Ubuntu14.04, thanks to the original author for sharing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.