Big data is getting hotter in recent years. Due to job needs and personal interests, we have recently started to learn about big data-related technologies. Some of the lessons learned in the learning process hope to be able to settle down through the blog, share discussions with netizens, as a personal memo.
First, the hadoop2.6.0 pseudo-distributed environment is built under the Win7 virtual machine.
1. Required Software
Use VMware 11.0 to build virtual machines and install Ubuntu 14.04.2 systems.
JDK 1.7.0_80
Hadoop 2.6.0
2. Installing VMware and Ubuntu
Slightly
3. Installing the JDK in Ubuntu
Unzip the JDK to the directory:/HOME/VM/TOOLS/JDK
Configure the environment variables in ~/.bash_profile and take effect through the source ~/.bash_profile.
#java Export JAVA_HOME=/HOME/VM/TOOLS/JDK Export JRE_HOME=/HOME/VM/TOOLS/JDK/JRE Export path= $JAVA _home/bin: $JRE _home/bin: $PATH Export classpath= $JAVA _home/lib: $JRE _home/lib: $CLASSPATH |
Verify that the JDK installation is successful.
4. Configure SSH trust relationship for password-free login
4.1 Installing SSH
Ubuntu has an SSH client installed by default, but does not have an SSH server installed, so it can be installed via Apt-get.
Installing Ssh-server:sudo apt-get Install Openssh-server
If you do not have an SSH client, you can install it through apt-get.
Installing Ssh-client:sudo apt-get Install Openssh-client
Start Ssh-server:sudo service SSH start
After launch, through Ps–aux | grep sshd to see if the SSH server was successfully installed.
4.2 Configuring SSH Trust relationships
Generate a public-private key pair for machine A: ssh-keygen-t RSA, followed by a return. In the ~/.ssh directory, generate the public key id_rsa.pub, the private key Id_ras.
Copy the id_rsa.pub of machine A into the authentication file of machine B:
Cat Id_rsa.pub >> ~/.ssh/authorized_keys
At this time machine A to Machine B trust relationship is established, at this point in machine A can not require a password directly SSH login machine B.
In this case, machine A and B are the same machine, and the SSH trust relationship can be verified using ssh localhost or SSH machine IP address.
5. Installing hadoop2.6.0
5.1 Decompression hadoop2.6.0
Download hadoop-2.6.0.tar.gz from official website, unzip to directory/home/vm/tools/hadoop, and configure ~/.bash_profile environment variables. Valid through source ~/.bash_profile.
#hadoop Export Hadoop_home=/home/vm/tools/hadoop Export path= $HADOOP _home/bin: $PATH Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib" |
5.2 Modifying a configuration file
To modify $hadoop_home/etc/hadoop/hadoop-env.sh and yarn-evn.sh, configure the Java_home path:
To modify the $hadoop_home/etc/hadoop/slaves, add the native IP address:
Cat "192.168.62.129" >> Slaves
Modify several important *-site.xml under $hadoop_home/etc/hadoop/:
Core-site.xml 192.168.62.129 is the IP address of my virtual machine
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.62.129:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/vm/app/hadoop/tmp</value> <description>a base for other temporary directories.</description> </property> </configuration> |
Hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/vm/app/hadoop/dfs/nn</value> </property> <property> <name>dfs.namenode.data.dir</name> <value>file:/home/vm/app/hadoop/dfs/dn</value> </property> <property> <name>dfs.permissions</name> <value>false</value> <description> Permission checking is turned off </description> </property> </configuration> |
Mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://192.168.62.129:9001</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> |
Yarn-site.xml
<configuration> <!--Site Specific YARN Configuration Properties-- <property> <name>yarn.nodemanager.aux-services</ name> <value>mapreduce_shuffle </value> </property> </CONFIGURATION> |
5.3 Format File system
execution of Bin/hdfs Namenode under $hadoop_home – format File system
5.4 Start-Stop
Execute sbin/start-dfs.sh and sbin/start-yarn.sh under $HADOOP _home to start the HADOOP cluster, execute sbin/stop-dfs.sh and sbin/ stop-yarn.sh Stop the Hadoop cluster.
The startup process, for example:
The completion process is as follows:
6. Querying cluster information
8088 ports, view all applications information:
50070 port, view hdfs information:
7. Verify that the Hadoop environment is built successfully
7.1 Verifying that HDFs is healthy
You can test with various HDFS commands. For example:
HDFs Dfs-ls./
HDFs dfs-put file.1./
HDFs Dfs-get./file1
HDFs dfs-rm-f./file.1
HDFs Dfs-cat./file1
HDFs dfs-df-h
7.2 Verifying that the Map/reduce calculation framework is normal
Executed under the $hadoop_home directory: Bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount./count_ in/./count_out/
The./count_in/is created in the HDFs cluster ahead of time, counting the number of words for all the files in that directory and outputting them to the./count_out/directory.
Examples of execution procedures are as follows:
Completion of the build result:
At this point, Hadoop2.6.0 's pseudo-distributed environment is built.
Build Hadoop2.6.0 Pseudo-distributed environment under Win7 virtual machine