Hardware environment
Hadoop build system environment: A Linux ubuntu-13.04-desktop-i386 system, both to do Namenode, and do datanode. (Ubuntu system is built on the hardware virtual machine)
Hadoop installation target version: Hadoop1.2.1
JDK installation version: jdk-7u40-linux-i586
Pig installation version: pig-0.11.1
Hardware virtual machine Erection Environment: IBM Tower Server x3500m3 mt:7380
Eclipse installation Version: ECLIPSE-STANDARD-KEPLER-SR1-LINUX-GTK
II. Software Environment Preparation
2.1 Hadoop
Hadoop http://www.aliyun.com/zixun/aggregation/16920.html ">release 1.2.1 (Stable) version, download link: http:// MIRROR.NEXCESS.NET/APACHE/HADOOP/COMMON/HADOOP-1.2.1/, select hadoop-1.2.1-bin.tar.gz file download.
2.2 Java
jdk1.7 version of Java, of course, can use 1.6, download link: http://www.oracle.com/technetwork/java/javase/downloads/ jdk7-downloads-1880260.html, choose Linux x86 jdk-7u40-linux-i586.tar.gz version download (because my Linux machine is 32-bit). If the Linux machine is 64, you must select 64-bit downloads, and different machines must be configured with different JDK versions.
2.3 Eclipse
Eclipse chooses Linux 32-bit downloads: https://www.eclipse.org/downloads/
Iii. Installation Steps
3.1 Adding a user specifically for Hadoop
Command Line Input:
sudo addgroup Hadoop
sudo adduser-ingroup Hadoop Hadoop
To set sudo permissions for a Hadoop user:
sudo vim/etc/sudoers
Add a line of Hadoop all= (All:all) all under root all= (All:all) all
Switch to Hadoop User: Su Hadoop
3.2 Create the directory and unzip the installation package
Create a Directory
sudo mkdir/home/hadoop/install
sudo mkdir/home/hadoop/software/hadoop/* This directory stores the Hadoop program file * *
sudo mkdir/home/hadoop/software/java/* This directory stores JDK program files. */
sudo mkdir/home/hadoop/software/eclipse/* This directory stores the Eclipse program files. */
Decompression Installation Compression Pack
sudo tar-xzvf '/home/master/download/jdk-7u40-linux-i586.tar.gz '-c/home/hadoop/software/java/
sudo tar-xzvf '/home/master/download/hadoop-1.2.1-bin.tar.gz '-c/home/hadoop/software/hadoop/
3.3 Configuring Hadoop
Configuring the Java Environment
Add Java_home,classpath environment variables. Use the sudo vi/etc/profile command to edit the profile file and add the following at the end of the file:
Hadoop_install=/home/hadoop/software/hadoop/hadoop-1.2.1/
Java_home=/home/hadoop/software/java/jdk1.7.0_40
Path= $JAVA _home/bin: $HADOOP _install/bin: $PATH
Classpath= $JAVA _home/lib
Export Java_home PATH CLASSPATH Hadoop_install
Then save the exit and use Source/etc/profile to make just the changes take effect immediately.
Then use the java–version command to see if the configuration is successful and the following information will appear if successful:
Java Version "1.7.0_40″
Java (TM) SE Runtime Environnement (build 1.7.0_40-b43)
Java HotSpot (TM) Client VM (build 24.0-b56, Mixed mode)
Configuring the SSH Environment
Use the following command to set up an SSH password-free connection:
Ssh-keygen
CP ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
SSH localhost
The last line of code is used for testing. The first run prompts you to continue, type Yes, enter, and if you do not require a password, it means success.
Configuring the Hadoop Environment
Through the cd/home/hadoop/software/hadoop/hadoop-1.2.1/conf into the Conf directory, see Haddoop-env.sh,core-site.xml,mapred-site.xml, Hdfs-site.xml these four files and slaves and masters files that need to be configured in full distribution mode.
1. Configure hadoop-env.sh: Find the line where the Java_home keyword is located, remove the front # number, and fill in the actual java_home address:
Export java_home=/home/hadoop/software/java/jdk1.7.0_40
2. Configure Core-site.xml: Open core-site.xml file with VI core-site.xml, and add the following contents to revisit tag:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</propety>
<!-fs.default.name: Used to configure Namenode, specify the URL of the HDFs file system, through which we can access the contents of the file system, or the localhost to change the cost of the machine IP address; You must change the localhost to the actual namenode machine's IP address, or use default port 8020 If you do not write the port. –>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop_tmp</value>
</property>
<!–hadoop.tmp.dir:hadoop default temporary path, this is the best configuration, if the new node or other circumstances inexplicable datanode can not start, delete the TMP directory in this file. However, if this directory is removed from the Namenode machine, then the Namenode formatted command needs to be executed again. The directory must be manually created beforehand. –>
3. Configure Hdfs-site.xml: Add the following to the Revisit tab, and all directories that do not exist are created in advance:
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/appdata/hadoopdata</value>
</property>
<!– configuration HDFs Storage directory, data storage directory, for datanode storage data –>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/appdata/hadoopname</value>
</property>
<!-is used to store Namenode file system metadata, including editing logs and file system images, and if you replace the address, you will need to reuse the Hadoop namenode–format command to format namenode–>
<property>
<name>dfs.replication</name>
<value>1</value>
</proerty>
<!-is used to set the number of redundant file system backups because there is only one node, all set to 1, and the default number of systems is 3–>
4. Configure Mapred-site.xml: Add the following content to the revisit tag:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<!-This configuration is used to configure the Jobtracker node, localhost can also change the IP address of the machine; in real distribution mode, notice that the IP address of the actual Jobtracker machine is changed –>
Iv. start Hadoop
4.1 Test the Hadoop configuration for success
The following command indicates that the configuration is correct when we see the version of Hadoop.
Hadoop version
4.2 Format Namenode
Cd/home/hadoop/software/hadoop/hadoop-1.2.1/bin
./hadoop Namenode–format
4.3 Starting the Hadoop process
Cd/home/hadoop/software/hadoop/hadoop-1.2.1/bin
./start-all.sh
Using the Java JPS command to see if a process has started successfully, it is OK to start secondarynamenode,jobtracker,namenode,datanode,trasktracker five processes successfully.
If a process does not start successfully, it means that the entire cluster is not working properly, entering/home/hadoop/software/hadoop/hadoop-1.2.1/libexec/. The/logs/directory allows you to view the failed journal.
4.4 View Hadoop information from the browser
View Jobtracker Information:
You can access Hadoop from a browser on this computer or another machine, and enter the following URL: http://10.1.151.168:50030/jobtracker.jsp, where 10.1.151.168 is the IP address of the machine.
View Namenode Information:
http://10.1.151.168:50070/dfshealth.jsp
View Trasktracker Information:
http://10.1.151.168:50060/tasktracker.jsp
Error notes
Password:localhost:permission Denied,please Try Recycle
Most of this is not given to Hadoop users with sudo permissions. So open your/etc/sudoers plus the Hadoop all= (all:all) all.
Tasktracker does not start properly
By looking up the Tasktracker error log in logs, it was found that one of the warn was set to not writable the permissions of the temp/hadoop_tmp.mapred/local/file in the corresponding directory. The above issues were resolved by modifying permissions, as follows:
sudo chmod 777/home/hadoop/temp/hadoop_tmp.mapred/local/
Each boot needs to be/etc/profile again, otherwise it will show no JDK
The problem is still unresolved because there is no reason to be found. How to do, forget, every time a little bit of source once again, for the moment.
Safemode:on–hdfs unavailable, causing nodes to display as 0, without namenode boot.
The query is a problem with the directory where the value of the Dfs.name.dir in the Hdfs-site.xml configuration is located, which shows: XXX is in a inconsistent state:storage directory does not exist Or is not accessible. where xxx represents that directory, constant restart and formatting can never solve the problem, delete this directory is also useless. Yes, I'm crazy, do you see my crazy eyes? Finally, suddenly thought of the role of chown, so I carried out the following instructions:
sudo chown-r hadoop:hadoop/home/hadoop/appdata/
Reformat, then start-all.sh, it's done! Summary of permissions issues for the file.