Build Hadoop2.6.0 pseudo distributed environment under Windows 7 Virtual Machine
In recent years, big data has become increasingly popular. Due to work needs and personal interests, I recently started to learn big data-related technologies. Some lessons learned during the learning process are expected to be accumulated through blog posts and shared discussions with netizens as a personal memo.
Article 1: Build the Hadoop2.6.0 pseudo distributed environment under the Win7 virtual machine.
1. Required Software
Use VMware 11.0 to build a virtual machine and install Ubuntu 14.04.2.
Jdk 1.7.0 _ 80
Hadoop 2.6.0
2. Install VMware and Ubuntu
Install Ubuntu 14.04 on a VMware Workstation 10 Virtual Machine in Windows 7
3. Install JDK in Ubuntu
Decompress jdk to the directory:/home/vm/tools/jdk
In ~ /. Configure the environment variables in bash_profile and use source ~ /. Bash_profile takes effect.
# Java Export JAVA_HOME =/home/vm/tools/jdk Export JRE_HOME =/home/vm/tools/jdk/jre Export PATH = $ JAVA_HOME/bin: $ JRE_HOME/bin: $ PATH Export CLASSPATH = $ JAVA_HOME/lib: $ JRE_HOME/lib: $ CLASSPATH |
Check whether jdk is successfully installed.
4. Configure the ssh trust relationship to achieve password-less Login
4.1 Install ssh
The ssh client is installed in Ubuntu by default, but the ssh server is not installed, so you can install it through apt-get.
Install ssh-server: sudo apt-get install openssh-server
If you do not have an ssh client, you can also install it through apt-get.
Install ssh-client: sudo apt-get install openssh-client
Start ssh-server: sudo service ssh start
After startup, run ps-aux | grep sshd to check whether the ssh server is successfully installed.
SSH service remote access to Linux Server login is slow
How to Improve the SSH login authentication speed of Ubuntu
Enable the SSH service to allow Android phones to remotely access Ubuntu 14.04
How to add dual authentication for SSH in Linux
Configure the SFTP environment for non-SSH users in Linux
Configure and manage the SSH service on Linux
Basic SSH tutorial
SSH password-free logon details
4.2 configure the ssh Trust Relationship
Generate A public/private key pair for machine A: ssh-keygen-t rsa, and press Enter. In ~ /. Generate the Public Key id_rsa.pub and private key id_ras In the ssh directory.
Copy id_rsa.pub of machine A to the authentication file of machine B:
Cat id_rsa.pub >> ~ /. Ssh/authorized_keys
At this time, the trust relationship between machine A and machine B is established. At this time, machine A can directly log on to machine B through ssh without the password.
In this example, machine A and machine B are the same machine. After configuring the ssh trust relationship, you can use ssh localhost or the IP address of the ssh machine for verification.
5. Install Hadoop2.6.0
5.1 decompress Hadoop2.6.0
Download hadoop-2.6.0.tar.gz from the official network, decompress it to the/home/vm/tools/hadoop directory, and configure ~ /. Bash_profile environment variable. Use source ~ /. Bash_profile takes effect.
# Hadoop Export HADOOP_HOME =/home/vm/tools/hadoop Export PATH = $ HADOOP_HOME/bin: $ PATH Export HADOOP_COMMON_LIB_NATIVE_DIR = $ HADOOP_HOME/lib/native Export HADOOP_OPTS = "-Djava. library. path = $ HADOOP_HOME/lib" |
5.2 modify the configuration file
Modify $ HADOOP_HOME/etc/hadoop/hadoop-env.sh and yarn-evn.sh to configure the JAVA_HOME path:
Modify $ HADOOP_HOME/etc/hadoop/slaves and add the local IP Address:
Cat "192.168.62.129"> slaves
Modify several important *-site. xml files under $ HADOOP_HOME/etc/hadoop:
Core-site.xml 192.168.62.129 is the IP address of my VM
<Configuration> <Property> <Name> fs. defaultFS </name> <Value> hdfs: // 192.168.62.129: 9000 </value> </Property> <Property> <Name> hadoop. tmp. dir </name> <Value> file:/home/vm/app/hadoop/tmp </value> <Description> a base for other temporary directories. </description> </Property> </Configuration> |
Hdfs-site.xml
<Configuration> <Property> <Name> dfs. replication </name> <Value> 1 </value> </Property> <Property> <Name> dfs. namenode. name. dir </name> <Value> file:/home/vm/app/hadoop/dfs/nn </value> </Property> <Property> <Name> dfs. namenode. data. dir </name> <Value> file:/home/vm/app/hadoop/dfs/dn </value> </Property> <Property> <Name> dfs. permissions </name> <Value> false </value> <Description> Permission checking is turned off </Description> </Property> </Configuration> |
Mapred-site.xml
<Configuration> <Property> <Name> mapred. job. tracker </name> <Value> hdfs: // 192.168.62.129: 9001 </value> </Property> <Property> <Name> mapreduce. framework. name </name> <Value> yarn </value> </Property> </Configuration> |
Yarn-site.xml
<Configuration> <! -- Site specific YARN configuration properties --> <Property> <Name> yarn. nodemanager. aux-services </name> <Value> mapreduce_shuffle </value> </Property> </Configuration> |
5.3 format a File System
Run bin/hdfs namenode-format in $ HADOOP_HOME to format the File System
5.4 start/stop
Run sbin/start-dfs.sh and sbin/start-yarn.sh under $ HADOOP_HOME to start the hadoop cluster and run sbin/stop-dfs.sh and sbin/stop-yarn.sh to stop the hadoop cluster.
The startup process is as follows:
The startup process is as follows:
6. query cluster information
Port 8088, view All Applications information:
Port 50070 to view hdfs information:
7. Verify that the hadoop environment is successfully built
7.1 verify that hdfs is normal
You can use various hdfs commands for testing. For example:
Hdfs dfs-ls ./
Hdfs dfs-put file.1 ./
Hdfs dfs-get./file1
Hdfs dfs-rm-f./file.1
Hdfs dfs-cat./file1
Hdfs dfs-df-h
7.2 verify that the map/reduce computing framework is normal
Run: bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount./count_in/./count_out/in the $ HADOOP_HOME directory/
Where./count_in/is created in the hdfs cluster in advance. count the number of words in all files in the directory and output them to the./count_out/directory.
The execution process is as follows:
Result generated after execution:
So far, the establishment of the pseudo-distributed environment of Hadoop2.6.0 has been completed.
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition