2017/6/21 Update after installation, create the logs folder under the/usr/local/hadoop/hadoop-2.7.3 path and change the permissions to 777
9-26 Important updates: All the commands in this article are from the real machine copy, may be in the process of pasting copy of the unknown error, so please manually enter the command, thank you.
Recently listened to a big data on Daniel's experience sharing, in the end of the sharing Daniel gave us a he had written about large data and geographical application demo. This demo needs to build a Hadoop platform on the Linux environment. This time I will simply share my information about
Some of the experience and problems encountered in building the Hadoop platform on Linux virtual machines and the solutions to the problems.
First of all, the environment we built this time is Hadoop. Hadoop implements a distributed file system that can be deployed in inexpensive hardware environments and provides high throughput to access application data, and is ideal for applications with large datasets. And most importantly, Hadoop is open source.
This time we are going to install our Hadoop lab environment on a single computer (virtual machine). If you have not yet installed the virtual machine, please check out the VMware Workstations Pro 12 installation tutorial. If you have not installed the Linux operating system in the virtual machine, please install the Ubuntu or CentOS tutorial under VMware.
The installed mode is stand-alone mode and pseudo distribution mode. Stand-alone mode is the most streamlined mode that is selected by default after Hadoop decompression, in which the configuration information in Core-site.xml, Hdfs-site.xml, and hadoop-env.sh is empty by default and is required for installation. Pseudo-distribution mode is that Hadoop runs on a single cluster, which has more code debugging than stand-alone mode, and enables the HDFs feature and can interact with several daemons
The Ubuntu16.04lts+java 1.8.0_101+hadoop 2.7.3 is installed in this article
Installation of Java environment on Linux
The first thing we need to know before installing Hadoop on Linux is that Hadoop is a program based on Java development. So we need to make sure that we have a Java environment on Linux before we install Hadoop. Here's how to install Java1.8.0_101 on Linux.
Before installing Java, we need to check that there is no Java installed in the system, use the Java-version command to see if Java is installed, and install java1.8.0 after uninstalling if other versions of Java are installed.
First we need to download the JDK installation package we need on the Oracle Web site, and the JDK is Java Development Kit, which contains the operating environment necessary for Java to run. You can download this web site to the java1.8.0_101 Linux version of the installation package, note that when downloading the installation package, please select the appropriate version of your operating system to download the number of files (the operating system refers to the Linux version of your virtual machine installed, You can view your current Linux version through the UNAME-A directive.
Www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
After the download is complete, we will get a compressed package with a suffix named. tar.gz, at which point we unzip the file to the/usr/java/directory (please create the new directory before extracting it)
TAR-ZXVF jdk-8u101-linux-x64.tar.gz-c/usr/java/
After the decompression, we can configure our environment variables.
Vim ~/.BASHRC
#写入环境变量
export java_home=/usr/java/jdk1.8.0_101
export classpath=.: $JAVA _home/lib/ Dt.jar: $JAVA _home/lib/tools.jar
export path= $JAVA _home/bin: $PATH
Use after you finish writing environment variables
SOURCE ~/.BASHRC
Make environment variables effective
After the configuration is done, we use
Java-version
To see if the Java installation is complete.
Second, install SSH server to achieve password-free login
Because Hadoop needs to communicate using SSH, we need to install SSH on our operating system. Before installing, we need to see if the system is installed and the SSH is started
#查看ssh安装包情况
dpkg-l | grep ssh
#查看是否启动ssh服务
ps-e | grep ssh
If there is no SSH service in the system, you can use the
sudo apt-get install Openssh-server
To install the SSH service, which is used after installation
Sudo/etc/init.d/ssh start
Open the service.
and then use
Ps-e | grep ssh
To see if the service starts.
SSH as a secure communication protocol, naturally need to enter the password when the communication, but because we pseudo distribution mode, so we will set the password-free login.
#生成秘钥
ssh-keygen-t dsa-p '-F ~/.ssh/id_dsa
#导入authorized_keys
cat ~/.ssh/id_dsa.pub >> ~/.ssh/ Authorized_keys
#测试是否免密码登录localhost
ssh localhost
Shut down the machine's firewall after the end
UFW Disable
Third, the installation of Hadoop
We can start to install our Hadoop after we finish our preliminary work.
Download Hadoop below provides a download link for Hadoop
Http://hadoop.apache.org/releases.html
Download binary
Extract the files after downloading
TAR-ZXVF hadoop-2.7.3-c/usr/local/hadoop/
Create a/usr/local/hadoop/directory between uncompressed
Here are the configuration files Core-site.xml, Hdfs-site.xml, hadoop-env.sh three files
All three files are under/usr/local/hadoop/hadoop-2.7.3/etc/hadoop/and are written in and in the first two files
First file Core-site.xml
Core-site.xml
<!--designated HDFS (Namenode) communication address-->
<property>
<name>fs.default.name </name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/windghoul/tmp</value>
</ Property>
Note that the/home/windghoul/tmp folder is being replaced with the TMP folder in the computer's current user directory. Please create
Second file Hdfs-site.xml
Hdfs-site.xml
<!--set HDFs number of replicas-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
The following line is found in the third file hadoop-env.sh and then the content is written
# The Java implementation to use.
Export java_home=/usr/java/jdk1.8.0_101
export hadoop_home=/usr/local/hadoop/hadoop-2.7.3
export path=$ Path:/usr/local/hadoop/hadoop-2.7.3/bin
Next, write the environment variables for Hadoop in the system environment variable
Vim/etc/environment
#在文件的结尾 "" Plus
:/usr/local/hadoop/hadoop-2.7.3/bin
:/usr/local/hadoop/hadoop-2.7.3/sbin
Reboot system
Verify that the Hadoop stand-alone mode installation is complete
Hadoop version
Seeing the version number on the screen showing Hadoop indicates that stand-alone mode has been configured to complete
The next step is to start HDFs using pseudo distribution mode
The first step of formatting
Hadoop Namenode-format
Display the following: successful formatting
...
...
16/09/24 23:39:53 INFO Common. Storage:storage Directory/home/windghoul/tmp/dfs/name has been successfully formatted.
...
...
/************************************************************
shutdown_msg:shutting down NameNode at ubuntu/ 127.0.1.1
************************************************************/
Start HDFs
sbin/start-all.sh
Show process
JPs
Seeing the following on the screen shows that HDFs has succeeded
Stop HDFs
sbin/stop-all.sh
The above command line requires the system path under the Hadoop installation path and, if run under/home/username, enter the full path.
So our Hadoop environment is basically built, and then I'll write a few simple apps for Hadoop to share.
Problem solving
Q: I entered Hadoop version on the command line after I had configured the file and did not show the number of Hadoop
A: Please check the configuration of environment variables, especially if there are any environment variables written to Hadoop, check the/etc/environment
and reboot your computer.
Q: I don't have the right formatting in the format
A: If you are in question 1, check that the Hadoop stand-alone mode is properly installed and configured before formatting and check that the Core-site.xml file is properly configured
Q: Always remind me to enter the localhost password when I finally start HDFs
A: If the reminder input password may be the owner of the TMP folder is not right, use chmod-r a+w/home/windghoul/tmp may be resolved.
In the final thanks to the network of several previous versions of the installation tutorial
Http://www.aboutyun.com/thread-7684-1-1.html
Http://www.aboutyun.com/thread-6487-1-1.html
http://blog.csdn.net/uq_jin/article/details/51451995
http://blog.csdn.net/hitwengqi/article/details/8008203