Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Environment
- System: Ubuntu 14.04 64bit
- Hadoop version: Hadoop 2.4.1 (stable)
- JDK version: OpenJDK 7
This tutorial is based on Hadoop 2.4.1, but applicable to Hadoop 2.x.
Create a hadoop user
If you are not using UbuntuhadoopUser, you need to addhadoopAnd set the passwordhadoop.
Create user
sudo useradd hadoop
Change Passwordhadoop, Enter the password twice as prompted
sudo passwd hadoop
Create a directory for the hadoop user before logging on
sudo mkdir /home/hadoopsudo chown hadoop /home/hadoop
You can consider adding administrator permissions for hadoop users to facilitate deployment and avoid the following problems:
sudo adduser hadoop sudo
Finally, log out of the current user and use the hadoop user to log in.
Install the SSH server and configure SSH login without a password
By default, the SSH client is installed in Ubuntu. You also need to install the SSH server.
sudo apt-get install openssh-server
For cluster and single-node modes, SSH password-less login is required. First, set SSH password-less login to the local machine.
Enter the command
ssh localhost
There will be the following prompt (the first SSH login prompt), enter yes.
SSH first login prompt
Enter the password as prompted.hadoopIn this way, log on to the local machine. However, such a login requires a password and requires a password-free login.
Log out of ssh and generate an ssh certificate:
Exit # exit ssh localhostcd ~ /. Ssh # If this directory does not exist, run ssh localhostssh-keygen-t rsa # Press enter until cp id_rsa.pub authorized_keys
Reuse nowssh localhostYou can log on directly, as shown in.
SSH password-less Login
Install the Java environment
We recommend that you install the Oracle JDK in previous tutorials. We do not recommend you use OpenJDK. However, as mentioned in http://wiki.apache.org/hadoop/hadoopjavaversions, the new version is okay under OpenJDK 1.7. Run the command to install OpenJDK 7.
sudo apt-get install openjdk-7-jre openjdk-7-jdk
The default installation location is:/usr/lib/jvm/java-7-openjdk-amd64 (you can run the commanddpkg -L openjdk-7-jdkSee ). After installation, you can usejava -versionCheck.
You need to configure the JAVA_HOME environment variable. This environment variable is used in many places and configured in/etc/environment:
sudo vim /etc/environment
Add a line at the end of the file:
JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
Save, log out and log on again, or restart to ensure that JAVA_HOME can be used in the new terminal window (after logout or restart, open a new terminal window and enterecho $JAVA_HOMEInspection ).
Install Hadoop 2.4.1
2.4.1: bytes.
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)
Decompress the package to/usr/local.
Sudo tar-zxvf ~ /Download/hadoop-2.4.1.tar.gz-C/usr/local # unzip to/usr/local sudo mv/usr/local/hadoop-2.4.1 // usr/local/hadoop # change the file name to hadoopsudo chown- R hadoop: hadoop/usr/local/hadoop # modify File Permissions
Decompress Hadoop and use it. Run the following command Hadoop to check whether the command is available:
/usr/local/hadoop/bin/hadoop
Hadoop standalone Configuration
By default, Hadoop runs in non-distributed mode, that is, a single Java Process, facilitating debugging. You can execute the example WordCount to feel the Hadoop operation. In this example, the configuration file of Hadoop is used as the input file, and the statistics conform to the regular expression.dfs[a-z.]+The number of times a word appears.
cd /usr/local/hadoopmkdir inputcp etc/hadoop/*.xml inputbin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'cat ./output/*
After the execution is successful, the job information is output as follows. The output result is that the regular word dfsadmin appears once.
Output result of Hadoop single-host WordCount
An error is prompted when you run the command again../outputDelete.
rm -R ./output
Hadoop pseudo-distributed configuration
Hadoop can run in a pseudo-distributed manner on a single node. Hadoop processes run in a separate Java Process. The node is NameNode and DataNode. Two configuration files need to be modifiedetc/hadoop/core-site.xmlAndetc/hadoop/hdfs-site.xml. The Hadoop configuration file is in xml format and declares the name and value of the property.
Modify configuration fileetc/hadoop/core-site.xml, Set
<configuration></configuration>
Modify it to the following Configuration:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property></configuration>
Modify configuration fileetc/hadoop/hdfs-site.xmlIs
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property></configuration>
Configuration Description: you only need to configure fs. defaultFS and dfs. replication can be run, but there is a saying that hadoop is not configured. tmp. dir parameter. In this case, the default temporary directory used by Hadoop is/tmp/hadoo-hadoop, which will be killed after each restart and must be re-executed (unverified ), therefore, it is best to set the pseudo-distributed configuration. In addition, you must explicitly specify dfs. namenode. name. dir and dfs. datanode. data. dir. Otherwise, an error may occur in the next step.
After the configuration is complete, initialize the file system HDFS:
bin/hdfs namenode -format
The final prompt is as follows,Exitting with status 0Success,Exitting with status 1:An error occurs. If an error occurs, add sudosudo bin/hdfs namenode -formatTry it.
Initialize HDFS File System
EnableNaneNodeAndDataNodeDaemon.
sbin/start-dfs.sh
If the following SSH prompt is displayed, enter yes.
SSH prompt when Hadoop is started
There may be the following warn prompts, and the following steps will also appear, especiallynative-hadoop libraryThis prompt can be ignored and does not affect hadoop functions. If you want to solve these prompts, you can refer to the additional tutorials later (it is best to solve the problem. It is not difficult, saving you the trouble of reading so many useless Tips ).
Warn prompt when Hadoop is started
After the instance is started successfully, run the following command:jpsThe following process is started:NameNode,DataNodeAndSecondaryNameNode.
View the startup log to analyze the cause of startup failure
Sometimes Hadoop cannot be started correctly. If the NameNode process is not successfully started, you can check the startup log to troubleshoot the problem. However, you may need to pay attention to the following points:
- At startup, a message like "Master: starting namenode, logging to/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.out" appears, where the Master corresponds to your machine name, but in fact the startup log information is recorded in the/usr/local/hadoop/logs/hadoop-hadoop-namenode-Master.log, so you should check this. log files;
- Each startup log is appended to the log file, so you have to look at it at the end. You can see the recorded time.
- Generally, the Error prompt is at the end, that is, where Fatal, Error, or Java Exception is written.
View started Hadoop processes through jps
In this case, you can access the Web interface http: // localhost: 50070 to view Hadoop information.
Hadoop Web Interface
For more details, please continue to read the highlights on the next page: