Deployment of Hadoop three operating modes on Ubuntu

Last Update:2017-12-09 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop cluster supports three modes of operation: Standalone mode, pseudo-distributed mode, fully distributed mode, below introduction under Ubuntu deployment

(1) Stand-alone mode by default, Hadoop is configured as a standalone Java process running in non-distributed mode, suitable for debugging at the start. Development in Eclipse is a standalone mode, without HDFs. Okay, if the JDK is not installed, the installation steps are as follows: First download the JDK Linux version on the official website, then press the download directly to the appropriate directory so that the JDK is installed. Next, configure the environment variables

Add the following code

The path to the Java_home is the path on your own local machine. You need to log out of the current user and log back in after saving.

Then open console input java-version Displays the Java version and other information indicating that the configuration was successful. Unzip the downloaded Hadoop and rename it to Hadoop (this is for ease of operation later). Into the Conf folder, in the hadoop-env.sh file to make changes in the Nineth line around to the position of #export java_home=******* so to the typeface, first will # (here # for comment to effect) remove, modify Java_ The value of home is the JDK in your machine to the file path, where the value and/etc/profile are the same. You can now run a Hadoop program in stand-alone mode, determining that the current path is a Hadoop folder

Bin/hadoop jar Hadoop-ex*.jar wordcount conf output

Conf is the input folder, output is the Export folder, so ensure that the Conf folder is present, and there are files. (2) pseudo-distributed mode pseudo-distributed mode is a single point of operation mode, all processes (NameNode, secondary NameNode, Jobtracker, Datenode, Tasktracker) are running on the only one node, need to use HDFS. First configure three XML files, file paths in the Conf folder under the Hadoop directory. Core-site.xml

<configuration>     <property>         <name>fs.default.name</name>         <value>hdfs ://localhost:9000</value>     </property>      <property>         <name>hadoop.tmp.dir</ name>         <value>/home/****/hadoop/logs</value>     </property></configuration>

　Hdfs-site.xml

<configuration>     <property>         <name>dfs.replication</name>         <value>1 </value>     </property></configuration>

Mapred-site.xml

<configuration>     <property>         <name>mapred.job.tracker</name>         <value >localhost:9001</value>     </property></configuration>

Next you need to install SSH
Enter in the console directly under Ubnuntu

sudo apt-get install Openssh-server

If you are prompted not to find the source, enter it in the software center

sudo apt-get update

After installing SSH, you need to set the login key, console input

Ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA   cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

The Ssh-keygen represents the generated key; t (note case sensitive) represents the specified generated key type, DSA is the meaning of DSA key authentication, that is, the key type;-P is used to provide a cipher;-F to specify the generated key file. Cat appends the Id_dsa.pub (public key) to the authorized Authorized_key. This allows the console to enter SSH localhost without a password. Here is the run step (path is the Hadoop path): (1) format the Namenode and open the Hadoop process

Bin/hadoop Namenode-format    bin/start-all.sh

Enter JPS to see if the relevant 5 processes are turned on. (2) Create an input folder on HDFs and upload the data

Bin/hadoop Fs-mkdir Input

Input is the name of the created folder, and date is the local data file (the default path is in the Hadoop directory), as for the command Bin/hadoop fs-help view. (3) Run WordCount, view results

Bin/hadoop jar Hadoop-examples-*.jar wordcount Input Output

Bin/hadoop fs-cat output/* >> result.txt

If the jar package is exported by itself, you can use Bin/hadoop jar Own.jar input output directly. The results are placed in the Result.txt file (local Hadoop directory) without adding it and viewing the results in the console. The program can be run through the browser HTTP//Machine name: 50030 View operation (4) Close the Hadoop process

bin/stop-all.sh

(3) Fully distributed mode for Hadoop, different systems have different node partitioning methods. In HDFs's eyes. Nodes are divided into Namenode and Datanode, where datanode can have multiple, and MapReduce appears to have multiple nodes jobtracker and Tasktracker,tasktracker. Almost all-distributed deployment and pseudo-distributed deployment. There are more than two machines, have the same user name (must), determine IP in the same network segment, and can ping each other.

192.168.6.30 Master  192.168.6.31 Node1

(1) The host generates SSH and distributes the key

SSH-KEYGEN-T RSA

Ssh-copy-id-i ~/.ssh/id_rsa.pub User Name @192.168.6.31

If the distribution is unsuccessful, you can use the following command

And then on the remote machine,

mkdir ~/.ssh  chmod ~/.ssh  mv ~/mas_key ~/.ssh/authorized_keys chmod  ~/.ssh/authorized_keys

Using Ssh-copy-id can not only add the public key to Authorized_keys, but also set the correct permissions (folder. SSH is 600 for 700,authorized_keys)
Reference article: Http://www.thegeekstuff.com/2008/11/3-steps-to-perform-ssh-login-without-password-using-ssh-keygen-ssh-copy-id/
SSH password-free login principle can be consulted: http://www.ruanyifeng.com/blog/2011/12/ssh_remote_login.html
This way, you should not enter the password when SSH 192.168.6.31 the host on the master host.
If there is an agent admitted failure to sign using the key this issue
The solution uses the SSH-ADD directive to add the private key in

Ssh-add   ~/.ssh/id_rsa

(2) Configuring the Hosts file

sudo gedit/etc/hosts127.0.0.1    localhost  #127.0.0.1 machine name  192.168.6.38 Master  192.168.6.31 Node1

* Comments on the second line must be commented out. The Hosts file is then distributed through the SCP to the slave, and the hosts file is moved to the slave machine.

Scp/etc/hosts target machine name @ target machine ip:~/hosts  sudo mv hosts/etc/

(3) To modify the Hadoop configuration file a total of 5 files need to be configured, of which three and pseudo-distributed, two of the files in the field needs to change to the host machine name (here is master, can be IP), and two files are masters and slaves Masters

Slaves

Master

There are two data nodes in this Hadoop cluster, master and slave. * 5 profiles also need to be distributed to the slave (that is, the configuration of Hadoop for the master and slave machines in the cluster is the same) so the configuration of Hadoop is completed, and the SSH node1 command allows you to log in directly. The next steps to run WordCount are the same as for pseudo-distributed mode.Note: Empty the logs directory in the cluster machine before each run.

Deployment of Hadoop three running modes on Ubuntu

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More