Deployment of Hadoop three operating modes on Ubuntu

Source: Internet
Author: User
Tags hadoop fs

Hadoop cluster supports three modes of operation: Standalone mode, pseudo-distributed mode, fully distributed mode, below introduction under Ubuntu deployment

(1) Stand-alone mode by default, Hadoop is configured as a standalone Java process running in non-distributed mode, suitable for debugging at the start. Development in Eclipse is a standalone mode, without HDFs. Okay, if the JDK is not installed, the installation steps are as follows: First download the JDK Linux version on the official website, then press the download directly to the appropriate directory so that the JDK is installed. Next, configure the environment variables

Add the following code

The path to the Java_home is the path on your own local machine. You need to log out of the current user and log back in after saving.

Then open console input java-version Displays the Java version and other information indicating that the configuration was successful. Unzip the downloaded Hadoop and rename it to Hadoop (this is for ease of operation later). Into the Conf folder, in the hadoop-env.sh file to make changes in the Nineth line around to the position of #export java_home=******* so to the typeface, first will # (here # for comment to effect) remove, modify Java_ The value of home is the JDK in your machine to the file path, where the value and/etc/profile are the same. You can now run a Hadoop program in stand-alone mode, determining that the current path is a Hadoop folder
Bin/hadoop jar Hadoop-ex*.jar wordcount conf output
Conf is the input folder, output is the Export folder, so ensure that the Conf folder is present, and there are files. (2) pseudo-distributed mode pseudo-distributed mode is a single point of operation mode, all processes (NameNode, secondary NameNode, Jobtracker, Datenode, Tasktracker) are running on the only one node, need to use HDFS. First configure three XML files, file paths in the Conf folder under the Hadoop directory. Core-site.xml
<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs ://localhost:9000</value>     </property>      <property>         <name>hadoop.tmp.dir</ name>         <value>/home/****/hadoop/logs</value>     </property></configuration>

 Hdfs-site.xml

<configuration>     <property>         <name>dfs.replication</name>         <value>1 </value>     </property></configuration>

Mapred-site.xml

<configuration>     <property>         <name>mapred.job.tracker</name>         <value >localhost:9001</value>     </property></configuration>

Next you need to install SSH
Enter in the console directly under Ubnuntu

sudo apt-get install Openssh-server  

If you are prompted not to find the source, enter it in the software center

sudo apt-get update

After installing SSH, you need to set the login key, console input

Ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA   cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys  
The Ssh-keygen represents the generated key; t (note case sensitive) represents the specified generated key type, DSA is the meaning of DSA key authentication, that is, the key type;-P is used to provide a cipher;-F to specify the generated key file. Cat appends the Id_dsa.pub (public key) to the authorized Authorized_key. This allows the console to enter SSH localhost without a password. Here is the run step (path is the Hadoop path): (1) format the Namenode and open the Hadoop process
Bin/hadoop Namenode-format    bin/start-all.sh
Enter JPS to see if the relevant 5 processes are turned on. (2) Create an input folder on HDFs and upload the data
Bin/hadoop Fs-mkdir Input  
Input is the name of the created folder, and date is the local data file (the default path is in the Hadoop directory), as for the command Bin/hadoop fs-help view. (3) Run WordCount, view results
Bin/hadoop jar Hadoop-examples-*.jar wordcount Input Output  
Bin/hadoop fs-cat output/* >> result.txt  
If the jar package is exported by itself, you can use Bin/hadoop jar Own.jar input output directly. The results are placed in the Result.txt file (local Hadoop directory) without adding it and viewing the results in the console. The program can be run through the browser HTTP//Machine name: 50030 View operation (4) Close the Hadoop process
bin/stop-all.sh  
(3) Fully distributed mode for Hadoop, different systems have different node partitioning methods. In HDFs's eyes. Nodes are divided into Namenode and Datanode, where datanode can have multiple, and MapReduce appears to have multiple nodes jobtracker and Tasktracker,tasktracker. Almost all-distributed deployment and pseudo-distributed deployment. There are more than two machines, have the same user name (must), determine IP in the same network segment, and can ping each other.
192.168.6.30 Master  192.168.6.31 Node1  

(1) The host generates SSH and distributes the key

SSH-KEYGEN-T RSA  
Ssh-copy-id-i ~/.ssh/id_rsa.pub User Name @192.168.6.31

If the distribution is unsuccessful, you can use the following command

And then on the remote machine,

mkdir ~/.ssh  chmod ~/.ssh  mv ~/mas_key ~/.ssh/authorized_keys chmod  ~/.ssh/authorized_keys  

Using Ssh-copy-id can not only add the public key to Authorized_keys, but also set the correct permissions (folder. SSH is 600 for 700,authorized_keys)
Reference article: Http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/
SSH password-free login principle can be consulted: http://www.ruanyifeng.com/blog/2011/12/ssh_remote_login.html
This way, you should not enter the password when SSH 192.168.6.31 the host on the master host.
If there is an agent admitted failure to sign using the key this issue
The solution uses the SSH-ADD directive to add the private key in

Ssh-add   ~/.ssh/id_rsa   

(2) Configuring the Hosts file

 

sudo gedit/etc/hosts127.0.0.1    localhost  #127.0.0.1 machine name  192.168.6.38 Master  192.168.6.31 Node1  

* Comments on the second line must be commented out. The Hosts file is then distributed through the SCP to the slave, and the hosts file is moved to the slave machine.

Scp/etc/hosts target machine name @ target machine ip:~/hosts  sudo mv hosts/etc/  
(3) To modify the Hadoop configuration file a total of 5 files need to be configured, of which three and pseudo-distributed, two of the files in the field needs to change to the host machine name (here is master, can be IP), and two files are masters and slaves Masters

Slaves

Master  
There are two data nodes in this Hadoop cluster, master and slave. * 5 profiles also need to be distributed to the slave (that is, the configuration of Hadoop for the master and slave machines in the cluster is the same) so the configuration of Hadoop is completed, and the SSH node1 command allows you to log in directly. The next steps to run WordCount are the same as for pseudo-distributed mode.Note: Empty the logs directory in the cluster machine before each run.

  

Deployment of Hadoop three running modes on Ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.