Brief introduction to Pseudo-distributed construction of Hadoop2.2.0

Last Update:2016-05-27 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The process of constructing Hadoop pseudo-distributed is briefly described, which makes it easy to see the reference later. Environment: vmware10+redhat6.3+hadoop2.2.0+jdk1.7
Hadoop mode:Local mode: Only one reduce and one map can be used to debug pseudo-distributed mode: Distributed through a single machine, used in learning. Verify that the logic is in the correct cluster mode: Working mode, there are hundreds of thousands of machines.
Linux environment with shutting down the firewallIf the service provided by the external network is absolutely unable to shut down the firewall. And Hadoop is generally used within the company, there are multiple nodes, and need to communicate between, at this time if the fire before the Port Shield will not be able to access the communication, it is convenient to choose to shut down the firewall. Modify IP: I specify IP as:192.168.8.88 In order to avoid random changes in the machine IP every time you start the box set hostname, here I named Hadoop01Set a name for each machine, if there is a problem can be easily located to the machine where, and host name and IP are mapped under etc/hosts。 Installing the JDK install Hadoop, and test set up SSH automatic login, SSH is a secure shell
Detailed steps: Step One: Modify the virtual machine configuration;1. Configure Virtual machine IP
For example, click Virtual Network Editor to configure host mode (use host mode here) or bridge
It can be seen that the host mode corresponding network card is VMNET1, configured its network segment of 192.168.8.0, and then enter into the Windows system to change the IP of Vmnet1 to fixed 192.168.8.100 Note the IP of the virtual machine under 192.168.8.1-bit Windows is configured here
, then configure the ip:192.168.8.88 of Linux in the virtual machine, Gateway: 192.168.8.1, if you want to surf the Internet, you must configure dns:8.8.8.8 or 8.8.4.4In addition, if for ease of operation, it is best to install Vmtools on the virtual machine, detailed installation steps csdn a lot. Don't repeat it here.
2. Change the hostname: Vim/etc/sysconfig/network 3, Firewall settings: Service iptables, prompts the firewall operation instruction, because the firewall is also the service and the service name is IptablesService iptables Status: View firewall status service iptables stop: Shuts down the firewall and will still start after restarting the machine again. Therefore, to permanently shut down the firewall: The firewall is turned on and off by default chkconfig iptables--list, such as:
configuration: Vim/etc/inittab view linux default boot mode; The default is the 5th graphical interface. use: Chkconfig iptables off, the firewall is all shut down in Linux that starts all modes
Restart the machine: reboot
Reboot after boot view hostname: hostname View Ip:ifconfig
Step Two: Install the JDKMust use more than 1.6, preferably not 8 Another: Shared folder:cd/mnt/hgfs/directory1. Create a Directory /usr/javaUnzip the downloaded JDK to this directory, 2, /etc/profileFiles are added: note the delimiter between environment variables in Linux is colon ":" Export JAVA_HOME=JDK installation directory (i.e.:/usr/java/jdk1.7.0_79 ) export path= $PATH: $JAVA _home/binSave to run after exiting: Source/etc/profile making the configuration file effective，
Step three: Install Hadoop1. Create a/hadoop directory under the root directory to extract the downloaded Hadoop package to the directory command: TAR-ZXVF hadoop-2.2.0.tar.gz-c/hadoop the directory structure after decompression is as follows:

bin: To store some executable scripts, commonly used Hadoop,yarn,hdfs include: Store header files, similar to the standard library of C sbin : Store some startup and shutdown related scripts start-yarn.sh and stop-yarn.sh etc: A configuration file for Hadoop, share: A Hadoop-related jar package.

2, configure Hadoop, modify four configuration files: 1) Vim hadoop-env.sh set java_home in the file Export java_home=/usr/java/jdk1.7.0_79
2) Modify the Core-site.xml file<configuration> <property>used to specify the address of the Namenode<name>fs.defaultFS</name> <value>hdfs://192.168.8.88:9000</value> </prope Rty> <property>used to specify the storage path of files generated by the Hadoop runtime<name>hadoop.tmp.dir</name> <value>/Hadoop/hadoop-2.2.0/tmp</value> </pro Perty> </configuration> 3) hdfs-site.xml file<configuration> <property>used to specify the number of HDFs saved copies of data, including itself, because this is pseudo-distributed, so it is set to 1. Save three copies by default in a real multi-machine distribution<name>dfs.replication</name> <value>1</value> </property></configur Ation> 4) Mapred-site.xml originally did not have this file only mapred-site.xml.temple file, it is possible to rename the original file mapred-site.xml.temple to Mapred-site.xml

mv mapred-site.xml.template mapred-site.xml Add the following:<configuration> <property> c4> tells Hadoop that MapReduce (MR) runs on yarn <name>mapreduce.framework.name</name> &L T;value>yarn</value> </property></configuration>

5) Yarn-site.xml

<configuration> <property> tell Hadoop how NodeManager gets the data in shuffle way <n Ame>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </ property> <property> Specify the address of ResourceManager in yarn <name>yarn.resourcemanager.h ostname</name> <value>hadoop01</value> </property></configuration>

6) Add Hadoop to the environment variable:/etc/profile. Convenient for later use of Hadoop instructions Add Command:

export hadoop_home=/hadoop/hadoop-2.2.0 export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin:$ Hadoop_home/sbin

To refresh the configuration:

Source/etc/profile

Step Four: Initialize HDFS (format file system)1 "has been used in the past: Hadoop Namenode-formatI don't need it now. Often used: HDFs Namenode-format See the following sentence to show that formatting succeeded
A TMP directory will appear under the Hadoop directory:
2 "Start HDFs and yarn run sbin/start-all.shSince it is currently a pseudo-distributed installation on a single machine, start all services at once, including HDFs and yarn, enter the "yes" and the root password several times after the successful startup, use sbin/stop-all.sh Note: in hadoop2.xRecommended First Call start-dfs.shCalled after start-yarn.sh, to start the task
3 "Input: JPS view shows the current Hadoop process information: If shown below indicates that the previous configuration was successful: Note JPS is a Java command (use which JPS to view the directory where the JPS script is located)If Hadoop opens normally after the above instruction is executed, the process will be displayed. NameNode: The boss of the HDFs department who manages Datanodes in HDFs DataNode: The node in HDFs that is responsible for data storage, the younger brother in the HDFs department. Secondarynamenode: Equivalent to Namenode Assistant, help Namenode to complete data synchronization work ResourceManager: Resource manager, the boss in Yarn department. NodeManager: The little brother in Yarn department is team leader, he's got his little brother working. 4 "Of course, you can also use browser login authentication: Note: I install Hadoop for Linux; 192.168.8.88, the host name is HADOOP01. Therefore, according to their own circumstances modify http://192.168.8.88:50070 (HDFs management interface) http://192.168.8.88:8088 (Mr Management interface)
Note: You can see after landing http://192.168.8.88:50070
Click Live node to enter the interface for the currently active nodes:
But click Browse the filesystemand the contents of the address bar are displayed as: Http://hadoop01:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr= 192.168.8.88:9000 that is, the host is the host name of the way HADOOP01Display instead of an IP address, you need to modify the C:/windows/system32/drivers/etc/hosts file in the Windows system to host the name HADOOP01 Configuration to a fileThe Refresh page appears as follows: Empty directory, stating that there is nothing in the current. Fifth step: Verify HDFs;Uploading files: Hadoop fs-put Linux file system on the HDFs file system of Hadoop files, will/root/test.txt upload to Hdfs://hadoop01:9000/hdfstest folder: Hadoop fs-put/root/test.txt hdfs://hadoop01:9000/hdfstest
After running successfully, refresh the browser page display:
Of course, files uploaded to HDFs can also be downloaded through the browser. download by name line:Hadoop fs-get hdfs://hadoop01:9000/hdfstest/home/hdfstest Sixth step: Test MapReduce and YarnIn the share directory in the Hadoop installation directory, a jar package is located and a MapReduce jar package is found, which has many test cases that are provided in the jar package using the WordCount method, you need to pass in a file and output the result to a file. Hadoop jar Hadoop-mapreduce-examples-2.2.0.jar wordcount/in.txt/out,txt /in.txt:hdfs on the input of the file to be tested
/out.txt:hdfs files on the output of the data store

the contents of the In.txt file refer to the following:Hello Tomhello Jokhello Jonyhello Tom after calculation, the number of occurrences of each word is counted, and the results are saved in OUT.txtSeventh step: SSH Free Login

As above, it is cumbersome to enter the Yes and root password multiple times before turning on Hadoop, and this is just a machine, which would be a huge workload if it were hundreds of machines. Below we set, SSH password-free login, 1 "Current virtual machine (192.168.8.88) send a command to 192,168.8.99, create a file under the root directory ssh 192.168.8.99 mkdir/test, you will be asked to enter a 8.99 password. As you can see, in order to be safe when connecting to other machines using the SSH protocol, you must enter the password for the other system, even if you are using SSH to connect yourself. 2 " Login-Free configuration 1) into the ~ directory there is a. SSH directory:cd ~/.ssh Only one of them known_hosts File 2) Generate secret key: Run ssh-keygen-t RSA Four carriage return, then generate two files in ~/.ssh directory a public key (Id_rsa.pub), a private key (ID_RSA), which is a bunch of strings. To implement the login system does not need to enter the password need to pass the public key to their system has been certified key (that is, put to Authorized_keysIn ), realizing the freedom of landing themselves. cp id_rsa.pub ~/.ssh/authorized_keys The Authorized_keys file is generated under the ~/.ssh directory, now again using SSH login itself system will not have to lose password:ssh hadoop01; and at this point in use sbin/start-all.sh starting Hadoop will no longer require a password and can be started directly. 3) Copy the contents of the public key to a machine that wants to avoid landing cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys or Ssh-copy-id hadoop01 (send my current public key to Hadoop01, Because I want to avoid landing to HADOOP01) For example192.168.8.88 Password-free login to 192.168.8.99 , copy the public key of 192.168.8.88 to 192.168.8.99, and notice who the public key is copied to. Ssh-copy-id 192.168.8.99, give your own public key to each other, The first time you send the password you need to enter the other, then you can avoid landing into the other system.
Summary: 1) Now generate a public key and a private key on your machine,ssh-keygen-t RSA 2) Send your own public key to the machine you want to log on to.
principles such as:

Brief introduction to Pseudo-distributed construction of Hadoop2.2.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More