Recently started studying Hadoop and wanted to write some of their knowledge here as a learning record. At the same time, it is a pleasure to bring help to those who need it. This article is basically a reference to the online information, I follow their own understanding and practical experience to organize it. Most of the online tutorials do not specifically describe the reasons behind the operation, I follow my understanding and some scattered on the internet to the fragmented knowledge points organized. Because of time and personal ability reasons, this installation document will have a lot of shortcomings, later I will try to learn to improve themselves. My Blog Park tour starts with this Hadoop installation. Because of the time, I first blog, the picture is not added, but the article basically describes the installation process.
My operating system is Ubuntuserver16.04.1, virtual machine 64-bit.
1 Installing the JDK (jdk-8u101-linux-x64.tar.gz)(1) Online download Source: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html (2) Place the compressed package into the/usr/lib /JVM Directory (3) Decompression pack: sudo tar-zxvf jdk-8u101-linux-x64.tar.gz (4) Configure environment variables open the shell environment variable profile for the current user Vim ~/.BASHRC add the following message at the end of the file # Begin Copyexport Java_home=/usr/lib/jvm/jdk1.8.0_101
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jarexport path= $JAVA _home/bin: $PATH #end Copy Note: (1) The blue part of the above configuration information is the path of the JDK folder you extracted, which can be set according to your own situation, if I unzip the installation package jdk-8u101-linux-x64.tar.gz to the/usr/local path, then Java_ HOME configuration changed to export java_home=/usr/local/jdk1.8.0_101 (2) The above configuration applies only to the current user, if you want to make the same settings for all users, that is, if you want all users to be able to use the JDK, you should configure/etc/ BASHRC file, see http://blog.csdn.net/chenchong08/article for the relationship between/ETC/PROFILE,/ETC/BASHRC,~/.BASHRC and ~/.bash_profile /details/7833242 (5) make the configured environment variable effective: The source ~/.BASHRC (the configuration of the ~/.BASHRC file does not need to restart the system immediately after it is set to take effect) (6) Check whether the installation is successful: java-version# if the installation is successful, The JDK version information will be prompted.
2 Installing SSH for password-free login(1) Because Hadoop needs to communicate using SSH, we need to install SSH on our operating system. Before installing, we need to see if the system is installed and SSH is started. Install directly, if installed, you will be prompted to install it. sudo apt-get install Openssh-server (2) Installing SSH sudo apt-get install SSH (3) Generate secret Key # After the SSH installation is successful, there will be a hidden folder in the user's root directory. SSH, if not please first execute command SSH loaclhost# enter. SSH directory CD ~/.ssh# generate key, you will be prompted to press ENTER to ssh-keygen-t RSA P ' # A graphic appears, and the graphic that appears is the password, without it # Two files are generated under/home/yw0/.ssh: Id_rsa and Id_rsa.pub, which is the private key and the latter is the public key. The #现在我们将公钥追加到authorized_keys中 (Authorized_keys is used to hold all public key content that is allowed to log on to the SSH client user as the current user) ~$ cat ~/.ssh/id_rsa.pub >> ~/. ssh/authorized_keys# can now login ssh to confirm the login without entering the password: My host name is ubuntu0, with ssh localhost can also # close the firewall sudo ufw disable# the command for the permanent effect, That is, the firewall will not reopen after reboot, you need to use the command sudo UFW enable
3 Installing Hadoop(1) Unzip the Hadoop installation package #hadoop install to any location can, as long as there is permission to read and write. Here we install Hadoop to the current user root # Place the installation package under the user's root directory (/HOME/YW0), and then unzip the installation package into the Hadoop folder tar-zxvf hadoop-2.7.3.tar.gz# after decompression is completed in Hadoop will be more than a hadoop-2.7.3 folder # will extract the folder hadoop-2.7.3 renamed to Hadoop, do not rename also can, here just for the convenience of simplicity, Folder names can be set according to user preferences. (2) Stand-alone mode environment configuration # Standalone Mode environment configuration # Open the. bashrc file under the user's root directory vim ~/.bashrc# Add the following information to the previous configuration Java_home line after export hadoop_install=/home/yw0/ Hadoopexport path= $PATH: $HADOOP _install/binexport path= $PATH: $HADOOP _install/sbinexport hadoop_mapred_home=$ Hadoop_installexport hadoop_common_home= $HADOOP _installexport hadoop_hdfs_home= $HADOOP _installexport YARN_HOME=$ hadoop_install# make the configuration effective source ~/.bashrc# Check whether the installation is successful, execute command: HDFS, see the Help message appears, that is, stand-alone mode installation is successful. (3) Pseudo-distribution mode environment configuration # before the configuration we need to open the/etc/hosts file, the host name corresponding to the IP we set the static IP, here my IP is 192.168.56.109# Configuring hadoop-env.sh, if you find that the JDK is not found when you run Hadoop, you can place the path of the JDK directly inside the hadoop-env.sh, in the line of the file # the Java implementation to use. Add the following information: # The Java implementation to Use.export java_home=/usr/lib/jvm/jdk1.8.0_1011. Configuring the Core-site.xml file # first create a new TMP folder under/home/yw0 #将如下信息复制到 <configuration> <property> <name>fs.defaultFS</name> < Inside value>hdfs://ubuntu0:9000</value></property> <property> < name>hadoop.tmp.dir</name> <value>/home/yw0/hadoop/tmp</value>< /property> #fs. Defaultfs used to specify the address of the Namenode? #hadoop. Tmp.dir? is used to specify the directory in which files generated by the Hadoop runtime are stored. 2. Configuring the Hdfs-site.xml file # First create a new folder in HDFs under/home/yw0/hadoop, then create a new Namenode, Datanode, and CKP three folders under new folder HDFs # Copy and paste the following information into the label <configuration> in <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/yw0/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</ Name> <value >/home/yw0/hadoop/hdfs/datanode</value> </property> <property> <name>fs.checkpoint.dir</name> <value>/home/yw0/hadoop/hdfs/ckp</ value> </property> <property> <name>fs.checkpoint.edits.dir </name> < Value>/home/yw0/hadoop/hdfs/ckp</value> </property> <property> <name>dfs.permissions& lt;/name> <value>false</value> </property> #dfs. Replication is used to specify the number of HDFs saved copies of data (on its own), which defaults to 3. 3 config mapred-site.xml file # Default does not have Etc/hadoop/mapred-site.xml, but there is a etc/hadoop/ Mapred-site.xml.template file, copy a copy and rename it <property> <name> Mapreduce.framework.name</name> <value>yarn</value></property>4 Configure Yarn-site.xml<property> <name>yarn.resouRcemanager.hostname</name> <value>ubuntu0</value ></property><!-- reducer How to get Data --><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property># Yarn.nodemanager.aux-services tells Hadoop NodeManager how to get the data shuffle the way #yarn.resourcemanager.hostname specifies yarn's boss ( ResourceManager) address (can be an address, can be a domain name, can be a host name). 5 formatted HDFs namenode-format# If Namenode has been successfully formated indicates successful installation 6 start HDFs and view process Sbin/start-all.shjps 7 Stop hdfssbin/stop-all.sh or sbin/stop-dfs.sh sbin/stop-yarn.sh reference article: Http://hadoop.apache.org/docs /current/hadoop-project-dist/hadoop-common/singlecluster.html#standalone_operationhttp://www.tuicool.com/ articles/zyqbzv--This blog has detailed configuration instructions for pseudo-distribution mode http://blog.csdn.net/tomato__/article/details/48547953http:// www.cnblogs.com/maybe2030/p/4591195.htmlhttp://blog.csdn.net/gnahznib/article/details/52488675http://blog.csdn.net/joe_007/article/ details/8298814--ssh localhost password-free login
Hadoop pseudo-distribution mode environment Setup