Hadoop, Zookeeper, hbase cluster installation configuration process and frequently asked questions (i) preparatory work

Source: Internet
Author: User
Tags requires zookeeper ssh remote desktop access
Introduction

Recently, with the need for scientific research, Hadoop clusters have been built from scratch, including separate zookeeper and HBase.

For Linux, Hadoop and other related basic knowledge is relatively small, so this series of sharing applies to a variety of small white, want to experience the Hadoop cluster.

At the same time, put forward some problems encountered in the process of building a cluster + solutions.

Mainly for the real cluster building process, that is distributed.

This article mainly introduces the pre-work and common problems, but also for my recent days of groping to carry out a summary. first, build the environment I prefer the latest software, although there will be a variety of instability, but is like experiencing new. So the configuration is roughly as follows: 3 nodes, one master, two slave. System environment: Ubuntu 16.04 64-bit Chinese version download link: http://www.ubuntu.org.cn/download/desktop Hadoop version: 2.7.3 download Link:/http Hadoop.apache.org/releases.html zookeeper version: 3.4.9 download Link: Http://www-us.apache.org/dist/zookeeper/Hbase version: 1.3.0 Download Link: http://www-us.apache.org/dist/hbase/
Note : Not all versions are adapted to each other, so be sure to check the version support relationships between the official documents, Hadoop, HBase, and more when you select the version.
B. Set up VNC Remote Desktop access for Linux

Most of the time, we use the Distributed node, are virtual machine form, very few host nodes, the server is not your own control, so remote access is useful, remote access can also be in the Windows system to manage the individual nodes, the only drawback is that if you restart or replace users, VNC will not be able to directly access, you need to re-login on the host or server node, specifically can solve I have not found the answer.

How to set it remotely.

1, Ctrl+alt+t summon Terminal "Terminal"

In the case that the node can be connected to the Internet, enter

sudo apt-get install XRDP

Code Explanation sudo is operated under root permissions for non-Linux root users apt-get install softwarename is a command to install software using code lines under Linux

XRDP can be understood as a software that Linux supports remote access.

The same way you can install JDK

sudo apt-get install Openjdk-7-jre openjdk-7-jdk


2, after the installation, in the upper left corner of the Linux system to find the search computer icon, click Search "Desktop Sharing", in the following configuration:


Mainly the first item to be checked.

3. Then go to the VNC website to download the client https://www.realvnc.com/download/vnc/


More systems are supported.

After startup:



The input requires a remote Ip,ok to be accessed after double-clicking the icon.

This must be a local area network, that is, to communicate with each other.


Use this method to set up different nodes, you will be able to appreciate the convenience later.


This step did not encounter any problem, that is, must be logged in the state, before you can use VNC remote access. If there is a restart or switch user operation, there is no way to access the VNC again, the solution I have not found. Third, the network setup after the installation of Linux, modify the IP address, you can choose no longer the same network segment, as long as the network is through, but the network speed will affect the use, so the need to do experiments such as the best or local machine, LAN, the same network segment.

IP mapping for three nodes: 172.18.5.4 Master 172.18.5.5 Slave1 172.18.5.6 Slave2
The above naming is to distinguish between the main node and the node clearly.
With Gedit in Linux, you can edit a variety of documents like Windows, but you want to be more mature in programming, you can install a vim, artifact, a variety of things that do not suit your operating habits. Here is a joke: Q: How to efficiently generate a random number answer: Let a person who does not understand the programming use VIM.

sudo apt-get install vim


After you install this editor, you can edit the document arbitrarily, but you can edit it with gedit without installing ...
In the process of installing Linux, it may not be named with Master or slave, it's OK, this user name is just for your memory and operation. If you want to change the user name execution
sudo vim/etc/hostname
Code explanation: With the root user to edit the/etc/hostname file with vim, do not knock, point I, into the insertion mode, and then modify. After confirming the correct changes, click ESC, exit Insert mode, enter: Wq, save and exit. If you do not want to modify, then: Q Exit directly.
The IP mappings of the preceding three nodes are to be modified in the/etc/hosts file.
sudo vim/etc/hosts

After entering the edit mode, the IP and the corresponding user name in a row, the middle with a tab interval, and then use the time can be used directly with the user name instead of IP. Launch edit mode, save.
Changing the user name requires a restart. VNC cannot be accessed directly after reboot and must be logged on to the host to continue access.
This method configures different nodes separately.
After the configuration is done, the terminal check is successful.
Ping Master-c X

is through the ping command to see if the node can ping the pass, the last italic x represents the number of times, if not this parameter will always ping. Although you can use Ctrl + ZQuit the current process, but it's better to set it up.
A byte return indicates a successful configuration. iv. Configuring SSH login node without passwordSSH under Linux is a good thing. By default, each use of SSH command (login to the remote host) will need to enter the remote host password, two times no problem, the number of times the password will be a lot of trouble, so you can use this step to achieve a free-secret ssh.
Because in the process of building the cluster, many configurations are the same, so try to unify the operation on master, and then copy the commands to the different nodes for personalized modification. Therefore, ssh-free operation is mainly master to slave.
1. The public key of the first-born cost machine
CD ~/.SSH               # If there is no such directory, first execute SSH localhost
rm./id_rsa*            # Remove the previously generated public key (if any)
ssh-keygen-t RSA       # always press ENTER to
Code Explanation: CD directory full path: Go to the directory under the RM full path file: Delete the file rm-r full path File/folder: Delete the file/folder Ssh-keygen: Generate private key and public key parameter-t RSA indicates that the RSA algorithm is used for encryption, and after execution, it is/hom e/Id_rsa (private key) and id_rsa.pub (public key) found under current user/.ssh directory
This operation generates the public key of the master node.
Cat./id_rsa.pub >>./authorized_keys

This operation implements the master node without password ssh native.
can be done by
SSH Master
Verify that the encryption is successful. If unsuccessful, you will need to re-operate to troubleshoot the cause.

2. Transfer master node public key to each slave node via SSH
SCP ~/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/

Code Explanation: SCP is the command used to transfer files between hosts if it is a folder transfer, you need to add-R to the SCP after the above code is to transfer the master generated public key (Id_rsa.pub) to the SLAVE1 node under the Hadoop user/home/ Under the hadoop/folder
This procedure requires the HADOOP@SLAVE1 password to be entered, and the input will be displayed after the transfer is complete.
Then to the slave node, under the Hadoop account, cat to this public key
mkdir ~/.ssh       # If the folder does not exist, it needs to be created, if it already exists, ignore
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
RM ~/id_rsa.pub    # You can erase it when you're done with it.

As with master, you need SSH Slave1 on the master node to verify success.
After the success, then use SSH to do two nodes between the communication can be free of password, really is enough convenient. 3. Problems encountered in this step:Do not know what reason, after successful verification, the use of a period of time, but also need to enter the password, I did not check out the reason. The error of the report is:
Sign_and_send_pubkey:signing Failed:agent refused operation

It's a hassle to re-operate the 1+2 step at a time, searching the web for a solution:
Eval "$ (ssh-agent-s)"
Ssh-add

will be able to re-enter the secret. I really don't know why, I hope the great god see the answer.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.