After playing with cloud applications, let's start playing with fanwe's hadoop ~~~ This article mainly describes the configuration of hadoop and hive components on a single machine. I will not talk much about it. Start ~~~
I. system running environment
Operating System Environment: centos 6.0 desktop for Linux
Hadoop version: hadoop 0.20.2:Http://www.apache.org/dyn/closer.cgi/hadoop/core/
Hive version: hive-0.8.0-bin: http://www.apache.org/dyn/closer.cgi/hive/
Ii. configuration steps
General Environment Configuration
By default, the system does not contain java. Because the JDK version of hadoop 0.20.2 only requires 1.6, in this case, you do not need to manually download JDK from the sun website, just enter: Yum install Java java-1.6.0-openjdk-devel in termial
After installing Java, run the following command: Java-version to verify that Java is correctly installed, as shown in, OK.
After installing Java, we also need to check whether sshd and SSH client tools are installed. By default, SSH has been installed, so we only need to install rsync.
Enter the command: Yum install rsync
Then confirm that you can use SSH to log on to localhost without a password
Enter the SSH localhost command: SSH localhost
Ssh-keygen-t dsa-P "-f ~ /. Ssh/id_dsa
Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Note:-P is followed by two "" signs, indicating that the password is set to null.
After completing the preceding configuration, decompress the package and configure hadoop.
1. decompress the hadoop package
Input command: Tar zxf hadoop-0.20.2.tar.gz
Go to the decompressed hadoop directory
CD hadoop-0.20.2
2. Modify the hadoop configuration file
A. Set java_home
Edit the conf/hadoop-env.sh file and find: # export java_home =/usr/lib/j2sdk1.5-sun
Remove the comment symbol # and change the path after the equal sign to the directory where your JDK is located. For example, if your Java executable file is in/usr/bin/Java, write (do not include bin ):
Export java_home =/usr
Note: If you do not know where Java is, use whereis Java to query
B. Configure the hadoop standalone cluster mode (pseudo cluster mode)
Modify CONF/core-site.xml with the content changed:
<Configuration>
<Property>
<Name> fs. Default. Name </Name>
<Value> HDFS :/// localhost: 9000 </value>
</Property>
</Configuration>
Modify CONF/hdfs-site.xml with the content changed:
<Configuration>
<Property>
<Name> DFS. Replication </Name>
<Value> 1 </value>
</Property>
</Configuration>
Modify CONF/mapred-site.xml with the content changed:
<Configuration>
<Property>
<Name> mapred. Job. Tracker </Name>
<Value> localhost: 9001 </value>
</Property>
</Configuration>
C. initialize hadoop namenode
Run: Bin/hadoop namenode-format
Bin/hadoop namenode-format
D. Start hadoop
Run: Bin/start-all.sh
Continue after all processes are executed
Note: You can use the JPS command to check which Java processes are started: Generally, the following processes are available: tasktracker, secondarynamenode, jobtracker, JPS, namenode, and datanode.
Hadoop installed successfully
Configure hive After configuring hadoop
1. Create a directory for hive in HDFS
Enter the following command:
Bin/hadoop FS-mkdir/tmp
Bin/hadoop FS-mkdir/user/hive/warehouse
Bin/hadoop FS-chmod g + w/tmp
Bin/hadoop FS-chmod g + w/user/hive/warehouse
Ii. Decompress hive
Tar zxf hive-0.8.0-bin.tar.gz
CD hive-0.8.0-bin
3. Set hadoop_home
Export hadoop_home =/home/chin/hadoop-0.20.2
4. Run hive
Bin/hive
Hive>
Hive runs successfully, and the configuration is easy and comfortable. the time used has not been written frequently. However, this configuration process only applies to hadoop0.20.2. The latest hadoop0.23conf structure has changed, but it does not matter, when configuring, compare it with the corresponding official reference manual ~