Description
Task: Build a Hadoop pseudo-distributed version.
Objective: To quickly build a learning environment, skip this environment, quickly enter the state, and use some Hadoop components to do some tasks
No choice 2.7, the bug is more, unstable.
Simple and fast selection of pseudo-distributed
Environment:
Win 7 8G RAM, 4 cores
VM 12,1 virtual Machine 3G memory
Ubuntu 4.4.0 x86-64
Hadoop 2.6.4
JDK 1.7.0_80
1. Virtual Machine Linux Preparation
Install the virtual machine, (you can choose how to clone), and the network chooses NAT.
Create user Hadoop, configure sudo command, file settings (to be refined: Baidu)
All subsequent operations are performed with Hadoop users, without permission on sudo
1.1 Network IP configuration (lazy, with default allocation, if multiple nodes to set, to be refined)
[Email protected]:~$ifconfigens33 Link encap:ethernet HWaddrxx: 0c: in: 2e:0f: theinet Addr:192.168.249.144Bcast:192.168.249.255Mask:255.255.255.0inet6 addr:fe80:: -:d d35:2b5d:4dba/ -scope:link up broadcast RUNNING multicast MTU: theMetric:1RX Packets:145870Errors0Dropped0Overruns:0Frame0TX Packets:12833Errors0Dropped0Overruns:0Carrier0Collisions:0Txqueuelen: +RX Bytes:209812987(209.8MB) TX Bytes:1827590(1.8MB)
1.2 Host name Settings
Modify the following three places:
A
sudo vi /etc/hostname[email protected]: more/etc/hostnamessmaster
B
hostname Ubuntu [email protected]: sudo hostname Ssmaster[email protected]: hostname Ssmaster
C
sudo VI /etc/hosts
After modification:
127.0.0.1 localhost
#127.0.1.1 Ubuntu
192.168.249.144 Ssmaster
2. Installing the JDK
Configuring Environment variables
Vi/etc/profile add save at the end
Export JAVA_HOME=/HOME/SZB/HADOOP/JDK1. 7 . 0_80export jre_home= $JAVA _home/jreexport PATH= $PATH: $JAVA _home/binexport CLASSPATH =./ : $JAVA _home/lib: $JAVA _home/jre/lib
Execute the command to take effect source/etc/profile
The following installation is successful
[Email protected]:~$ java-"1.7.0_80"1.7. 0_80- 24.80-b11, Mixed mode)
3. SSH Settings
Test SSH ssmaster First (current hostname, previous settings)
A password is required to indicate that it is not set.
Execute the following command to enter the carriage.
[Email protected]:~$ CD ~[email protected]:~$Ssh-keygen-t Rsa[email protected]:~/.SSH$CPid_rsa.pub authorized_keys[email protected]:~/.SSH$lsAuthorized_keys Id_rsa id_rsa.pub known_hosts[email protected]:~/.SSH$ MoreAuthorized_keysSSH-rsa aaaab3nzac1yc2eaaaadaqabaaabaqcxjtffupsmtnnhj4+ 4subfrnez7teyu3hhvq7lq0cowxej6r53za9lcawdykusrv5pnly4bqlt6swjselysieu+Wgpvl6unwroueubdagbnurviuvt6dxlccolqscvy0aqsk+yivs+qqhme839x4w+zd5xbzgulgiqs1whxbcs8shiho09rxa0mibxblyvkfwmh71ubxny6gqhh3zriyrzo0krcmgwphgsc/83fzsujnw5bkiesjkplhejmco8m+eqw1hcmj7ofmnabaih86rqunae4rnrjnquin73kgufkqehwngrl3cpwr/kxdnvoeyuphc/eew0hhfk8gcwlq/p [email protected]
Test, should no password login successful
SSH Ssmaster
Exit
3. Preparing the Hadoop installation package
Download to any directory
Extract
TAR-ZXVF hadoop-2.6.4.tar.gz
Mobile Unpacking Package
sudo mv hadoop-2.6.4/opt/
4. Configure Hadoop
4.1 Adding Hadoop paths to environment variables
sudo vi/etc/profile modified as follows
Export hadoop_home=/opt/hadoop-2.6. 4 export Java_home=/home/szb/hadoop/jdk1. 7 . 0_80export jre_home= $JAVA _home/jreexport PATH= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _ home/sbinexport CLASSPATH=./: $JAVA _home/lib: $JAVA _home/jre/lib
Source/etc/profile Effective
4.2 Creating an HDFs data store directory
Create Dfs/name Dfs/data in the Hadoop installation directory
[Email protected]:/opt/hadoop-2.6.4$pwd/opt/hadoop-2.6.4[email protected]:/opt/hadoop-2.6.4$mkdirDfs[email protected]:/opt/hadoop-2.6.4$lsBin Dfs etc include Lib Libexec LICENSE.txt logs NOTICE.txt README.txt sbin share Tmp[email protected]:
/opt/hadoop-2.6.4$ cd Dfs[email protected]:/opt/hadoop-2.6.4/dfs$mkdirname Data[email protected]:/opt/hadoop-2.6.4/dfs$lsData Name
4.3 Adding the JDK path to the Hadoop xxxx.sh script file
Location [Email protected]:/opt/hadoop-2.6.4/etc/hadoop$
Add in the following file
Export java_home=/home/szb/hadoop/jdk1.7.0_80
hadoop-env.sh
yarn-env.sh
mapred-env.sh
4.4 Modify Slaves File
Location [Email protected]:/opt/hadoop-2.6.4/etc/hadoop$
Modify the contents of the slaves file as the hostname, after modification:
[Email protected]:/opt/hadoop-2.6. 4 More Slavesssmaster
4.5 Configuration XML file
4.5.1 Core-site.xml
Post-modification content
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ssmaster:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.6.4/tmp</value>
</property>
</configuration>
Note:
Directory of Fs.defaultfs Namenode
Hadoop.tmp.dir Intermediate Temporary Results Storage Directory
At present, the Core-site.xml file minimizes configuration, core-site.xml each configuration can refer to: http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/ Hadoop-common/core-default.xml
4.5.2 Hdfs-site.xml
Post-modification content
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-2.6.4/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/opt/hadoop-2.6.4/dfs/data</value>
</property>
</configuration>
Note:
Dfs.replication number of replicas, pseudo-distributed to 1, distributed generally 3
Dfs.namenode.name.dir namenode Data Catalog
Dfs.namenode.data.dir datanode Data Catalog
The above is the Hdfs-site.xml file minimization configuration, hdfs-site.xml each configuration can refer to: http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/ Hadoop-hdfs/hdfs-default.xml
4.5.3 Mapred-site.xml
First copy Mapred-site.xml.template to Mapred-site.xml
Add Content:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Note:
Mapreduce.framework.name MapReduce Resource Management component, other values can exist
The above is mapred-site.xml minimized configuration, mapred-site.xml each configuration can refer to: http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/ Hadoop-mapreduce-client-core/mapred-default.xml
4.5.4 Yarn-site.xml
Post-modification content
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>ssmaster</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Note:
Yarn.resourcemanager.hostname the ResourceManager node. (problem guessing if it is distributed, can be different from Namenode node, to be verified)
Yarn.nodemanager.aux-services not clear meaning, free to understand
The above content is the minimization configuration of Yarn-site.xml, the contents of yarn-site file configuration can be referenced as follows: http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/ Hadoop-yarn-common/yarn-default.xml
5. Start Hadoop5.1 format HDFs
[Email protected]:/opt/hadoop-2.6.4$ Bin/hdfs Namenode-format
The last log has this, indicating success 16/10/22 19:40:40 INFO Common. Storage:storage Directory/opt/hadoop-2.6.4/dfs/name has been successfully formatted.
5.2 Starting HDFs
[Email protected]:/opt/hadoop-2.6.4$ SBIN/START-DFS.SHStarting namenodes on [ssmaster]ssmaster:starting namenode, logging to/opt/hadoop-2.6.4/logs/hadoop-hadoop-namenode-ssmaster.outssmaster:starting Datanode, logging to/opt/hadoop-2.6.4/logs/hadoop-hadoop-datanode-ssmaster.outstarting secondary Namenodes [0.0.0.0]the authenticity of Host'0.0.0.0 (0.0.0.0)'Can't be established. ECDSA Key fingerprint is sha256:adblljhq7xybjrfqpw9t5oya7+q7yo50s+ok7lianuk.are you sure you want to continue connecting ( yes/no)? Yes0.0.0.0:warning:permanently added'0.0.0.0'(ECDSA) to the list of known hosts.0.0.0.0:starting Secondarynamenode, logging To/opt/hadoop-2.6.4/logs/hadoop-hado Op-secondarynamenode-ssmaster.out[Email protected]:/opt/hadoop-2.6.4$ JPS11151DataNode11042NameNode11349Secondarynamenode11465Jps
http://192.168.249:144:50070/
Note:
Starting secondary namenodes [0.0.0.0]
The authenticity of host ' 0.0.0.0 (0.0.0.0) ' can ' t be established.
Secondary Namenode IP is 0, Next prompts yes/no, select Yes.
Do not know how to configure here. Have time to go back to study [left small problem]
5.3 Starting HDFs
[Email protected]:/opt/hadoop-2.6.4$ sbin/start-yarn.SHstarting yarn daemonsstarting ResourceManager, logging to/opt/hadoop-2.6.4/logs/yarn-hadoop-resourcemanager-ssmaster.outssmaster:starting NodeManager, logging to/opt/hadoop-2.6.4/logs/yarn-hadoop-nodemanager-Ssmaster.out[email protected]:/opt/hadoop-2.6.4$ JPS11151DataNode11042NameNode11714Jps11349Secondarynamenode11675NodeManager11540ResourceManager
http://192.168.249.144:8042/
http://192.168.249.144:8088/
Port grooming for the Hadoop Web console page:
50070:hdfs File Management
8088:resourcemanager
8042:nodemanager
JPS view each node started, the WEB can open a variety of pages, logo installation success
6. Save the virtual machine image
Z Summary: Hadoop pseudo-distribution builds initial success z.1 existence: [Legacy research]
- Network configuration is not intentionally set up, automatically assigned by the virtual machine, there may be potential IP change issues
- Hostname chatty to set the function of each file command
- HDFs Boot is secondname node IP display as 0000, prompting connection rejection, certain places can be set
z.2 Follow-up:
- Focus on Hadoop use, install Eclipse, common operations, Jar calls
- Build spark environment, common operations
- Be free to study pure distributed building
- Be free to study the meaning of each parameter in Hadoop configuration, configure
Q Other:
Copy files from different Linux systems
SCP hadoop-2.6.4.tar.gz [Email protected]:~/
Various configuration files are packaged and uploaded:
The files that are involved after the native Hadoop installation. rar task: Upload to a place, link over [legacy Perfect]
C Reference:
Ref 1
The main reference for this tutorial
hadoop2.6.0 version build pseudo-distributed environment
http://blog.csdn.net/stark_summer/article/details/43484545
Pseudo-distributed Hadoop 2.6.4