Configure hadoop on a single machine in Linux

Source: Internet
Author: User
Tags blank page hadoop fs


1. install Java.

First install Java. Because openjdk is installed in Ubuntu by default, You can uninstall it first. Enter the command on the terminal: sudo apt-Get purge openjdk *.

1. Download JDK for Linux from the sun homepage. I downloaded jdk-6u37-linux-i586.bin. :Click to open

2. Switch to the root user and the directory to be installed. My installation directory is/usr/lib/JVM/Java. Copy the downloaded file to this directory. And use the command: chmod A + x jdk-6u37-linux-i586.bin to set this File Permission to executable type.

3. Start installation. Enter:./jdk-6u37-linux-i586.bin in the terminal, then the installation process is performed. During installation, you will be prompted to press enter to continue.

4. After the installation is complete, done will appear. Indicates that the Java environment has been installed. The installation path is the current directory/usr/Java. Of course, you can select another location.

5. After the installation is complete, an error occurs when you enter Java directly on the terminal. In this case, you also need to configure environment variables. If only the Export command is used, the current shell of the detachment works. If you switch to shell or restart, the system will still fail. You can select the configuration. bashr file or/etc/profile. The latter modifies the system configuration file, which is valid for all users.

6. Use Vim to open the/etc/profile file. Add the following content at the end:

export JAVA_HOME=/usr/lib/jvm/java/jdk1.6.0_37export JRE_HOME=/usr/lib/jvm/java/jdk1.6.0_37/jreexport CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATHexport PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

(Note:Do not make any mistakes. Otherwise, you may not be able to enter the system after the reboot. If the input is incorrect, there is only a blank page after the boot. Press CTRL + ALT + F1 to enter the tty1 command line interface, run the following command: sudo Vim/etc/profile to check whether the configuration is correct and then restart the system.Note that the Export command = has no space on both sides.)

7,Save and restart the computer.

(Note:If you have information on the Internet, you can use source to update it. Enter the command source/etc/profile on the terminal. According to my test, this method will be valid only on one terminal. If a new terminal is opened, the Java configuration will be invalid unless the source command is updated again .)

8. Run the Env command to view the values of environment variables. If the content of each variable is the same as that configured previously, the configuration is successful. You can also run the Java-version command. If Java version "1.6.0 _ 37" is output, the configuration is correct.

2. Create a hadoop group and a hadoop user.

1. Create a hadoop user group: sudo addgroup hadoop

2. Create a hadoop User: sudo adduser-ingroup hadoop hadoopusr. You are required to enter the password and user information. When entering user information, you can directly contact enter (indicating the default value ). Enter y.

3. add permissions to the newly created hadoop user hadoopusr to open the/etc/sudoers file. Run sudo gedit/etc/sudoers. Grant the root user the same permissions to hadoopusr. Add the following information at the end of the file:

root    ALL=(ALL:ALL)  ALLhadoopusr  ALL=(ALL:ALL)  ALL

3. Install the SSH service

SSH allows remote logon and management. For details, refer to other related documents.

Run sudo apt-Get Install SSH OpenSSH-server to install OpenSSH-server.

If you have installed SSH, proceed to the next step.

4. Create an SSH password-less logon to the Local Machine

First, convert it to a hadoop user, run the following command: su-hadoopusr, and then enter the password.

SSH key generation methods include RSA and DSA. By default, RSA is used.

1. Create an SSH-key. We use the RSA method. Enter the command: SSH-keygen-t rsa-P ""

Enter file in which to save the key (/home/hadoopusr/. Ssh/id_rsa): ", and press Enter. The following information is displayed:

Created directory '/home/hadoopusr/.ssh'.Your identification has been saved in /home/hadoopusr/.ssh/id_rsa.Your public key has been saved in /home/hadoopusr/.ssh/ key fingerprint is:d4:29:00:6e:20:f0:d9:c6:a2:9b:cd:22:60:44:af:eb hadoopusr@shan-pcThe key's randomart image is:+--[ RSA 2048]----+|+.. ...          ||.o.*   . . .     || .+.*   o o      ||...+   . .       ||oo      S        ||o=.              ||=.o              ||o.               || E               |+-----------------+

(Note: After you press enter ~ Two files are generated under/. Ssh/: id_rsa and These two files are paired .)

2. Enter ~ In the/. Ssh/directory, append to the authorized_keys authorization file. Run the following command:

Cd ~ /. SSH
Cat> authorized_keys

You can log on to the local machine without a password.

3. log on to localhost. Enter the command: SSH localhost

(Note: After you remotely log on to another machine through SSH, you control the remote machine. You must run the exit command to re-control the local host .)

4. Run the exit command. Enter the command: Exit

5. Install hadoop.

Download hadoop from the hadoop website. The version used this time is 1.1.0. Download hadoop-1.1.0.tar.gz from the official website. (Note: Use a non-Root User ):Click to open

1、fake hadoop-1.1.0.tar.gz on the desktop, copy it to the installation directory/usr/local. Run the command: sudo CP hadoop-

2. Decompress hadoop-1.1.0.tar.gz. Run the following command:

Sudo tar-zxf hadoop-1.1.0.tar.gz

3. Rename the decompressed folder to hadoop. Run the command: sudo MV hadoop-1.1.0 hadoop

4. Set the owner of the hadoop folder to hadoopusr. Run sudo chown-r hadoopusr: hadoop

5. Open hadoop/CONF/ file. Run the command: sudo gedit hadoop/CONF/

6. Configure CONF/ (find # export java_home = ..., remove # and add the local JDK path): Export java_home =/usr/lib/JVM/Java/jdk1.6.0 _ 37

7. Open the conf/core-site.xml file. Enter the command sudo gedit hadoop/CONF/core-site.xml and modify it :(Note: If you directly copy the code here, there will be a row number. Remove the row number. The same below.)

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>  <property>    <name></name>    <value>hdfs://localhost:9000</value>  </property></configuration>

8. Open the conf/mapred-site.xml file. Enter the command: sudo gedit hadoop/CONF/mapred-site.xml, modify:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>  <property>         <name>mapred.job.tracker</name>        <value>localhost:9001</value>       </property></configuration>

9. Open the conf/hdfs-site.xml file. Enter the command: sudo gedit hadoop/CONF/hdfs-site.xml, modify:

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration>  <property>     <name></name>     <value>/usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2</value>   </property>   <property>     <name></name>     <value>/usr/local/hadoop/data1,/usr/local/hadoop/data2</value>   </property>   <property>     <name>dfs.replication</name>     <value>2</value>   </property> </configuration>

10. Open the conf/masters file and add it as the host name of secondarynamenode. as a standalone environment, you only need to enter localhost. Sudo gedit hadoop/CONF/masters

11. Open the conf/slaves file and add one row as the Server Load balancer host name. As a standalone version, you only need to enter localhost. Sudo gedit hadoop/CONF/slaves

6. Run hadoop on a single machine

1. Go to the hadoop directory and format the HDFS file system. This operation is required when you first run hadoop,


Bin/hadoop namenode-format

2. When you see the following information, it indicates that your HDFS file system has been formatted successfully.

******************** * ************/12/11/19 14:13:14 info namenode. fseditlog: Closing edit log: Position = 4, editlog =/usr/local/hadoop/datalog2/current/edits12/11/19 14:13:14 info namenode. fseditlog: Close success: truncate to 4, editlog =/usr/local/hadoop/datalog2/current/edits12/11/19 14:13:14 info common. storage: storage directory/usr/local/hadoop/datalog2 has been successfully formatted.12/11/19 14:13:14 info namenode. namenode: shutdown_msg: /*************************************** * ******************** shutdown_msg: shutting down namenode at Shan-PC/ ******************************* *****************************/

3. Start bin/ Enter the command: Bin/

4. Check whether hadoop is successfully started. Enter the command: JPs

If there are five processes: namenode, secondarynamenode, tasktracker, datanode, and jobtracker, it indicates that your hadoop standalone environment has been configured.

OK. A hadoop standalone version environment has been set up ~ Next, let's run an instance to test it ~

VII. Test

1. Go to the hadoop directory (CD/usr/local/hadoop) and enter the startup command: Bin/ Start hadoop.

2. Run the following command to execute the wordcount program computing process:

Echo "Hello World">/home/hadoopusr/file01
Echo "Hello hadoop">/home/hadoopusr/file02
Bin/hadoop FS-mkdir Input
Bin/hadoop FS-copyfromlocal/home/hadoopusr/file0 * Input
Bin/hadoop jars hadoop-examples-1.1.0.jar wordcount Input Output

hadoopusr@shan-pc:/usr/local/hadoop$ echo "hello world" > /home/hadoopusr/file01hadoopusr@shan-pc:/usr/local/hadoop$ echo "hello hadoop" > /home/hadoopusr/file02hadoopusr@shan-pc:/usr/local/hadoop$ bin/hadoop fs -mkdir inputhadoopusr@shan-pc:/usr/local/hadoop$ bin/hadoop fs -copyFromLocal /home/hadoopusr/file0* inputhadoopusr@shan-pc:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.1.0.jar wordcount input output12/11/19 15:34:15 INFO input.FileInputFormat: Total input paths to process : 212/11/19 15:34:15 INFO util.NativeCodeLoader: Loaded the native-hadoop library12/11/19 15:34:15 WARN snappy.LoadSnappy: Snappy native library not loaded12/11/19 15:34:15 INFO mapred.JobClient: Running job: job_201211191500_000612/11/19 15:34:16 INFO mapred.JobClient:  map 0% reduce 0%12/11/19 15:34:21 INFO mapred.JobClient:  map 100% reduce 0%12/11/19 15:34:29 INFO mapred.JobClient:  map 100% reduce 33%12/11/19 15:34:30 INFO mapred.JobClient:  map 100% reduce 100%12/11/19 15:34:31 INFO mapred.JobClient: Job complete: job_201211191500_000612/11/19 15:34:31 INFO mapred.JobClient: Counters: 2912/11/19 15:34:31 INFO mapred.JobClient:   Job Counters 12/11/19 15:34:31 INFO mapred.JobClient:     Launched reduce tasks=112/11/19 15:34:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=752012/11/19 15:34:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/11/19 15:34:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/11/19 15:34:31 INFO mapred.JobClient:     Launched map tasks=212/11/19 15:34:31 INFO mapred.JobClient:     Data-local map tasks=212/11/19 15:34:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=940612/11/19 15:34:31 INFO mapred.JobClient:   File Output Format Counters 12/11/19 15:34:31 INFO mapred.JobClient:     Bytes Written=2512/11/19 15:34:31 INFO mapred.JobClient:   FileSystemCounters12/11/19 15:34:31 INFO mapred.JobClient:     FILE_BYTES_READ=5512/11/19 15:34:31 INFO mapred.JobClient:     HDFS_BYTES_READ=25312/11/19 15:34:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=7188412/11/19 15:34:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2512/11/19 15:34:31 INFO mapred.JobClient:   File Input Format Counters 12/11/19 15:34:31 INFO mapred.JobClient:     Bytes Read=2512/11/19 15:34:31 INFO mapred.JobClient:   Map-Reduce Framework12/11/19 15:34:31 INFO mapred.JobClient:     Map output materialized bytes=6112/11/19 15:34:31 INFO mapred.JobClient:     Map input records=212/11/19 15:34:31 INFO mapred.JobClient:     Reduce shuffle bytes=6112/11/19 15:34:31 INFO mapred.JobClient:     Spilled Records=812/11/19 15:34:31 INFO mapred.JobClient:     Map output bytes=4112/11/19 15:34:31 INFO mapred.JobClient:     CPU time spent (ms)=125012/11/19 15:34:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=33633894412/11/19 15:34:31 INFO mapred.JobClient:     Combine input records=412/11/19 15:34:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=22812/11/19 15:34:31 INFO mapred.JobClient:     Reduce input records=412/11/19 15:34:31 INFO mapred.JobClient:     Reduce input groups=312/11/19 15:34:31 INFO mapred.JobClient:     Combine output records=412/11/19 15:34:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=32619724812/11/19 15:34:31 INFO mapred.JobClient:     Reduce output records=312/11/19 15:34:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=112887398412/11/19 15:34:31 INFO mapred.JobClient:     Map output records=4

3. After the wordcount program is executed, run the command bin/hadoop FS-ls output. View the output result. As follows:

hadoopusr@shan-pc:/usr/local/hadoop$ bin/hadoop fs -ls outputFound 3 items-rw-r--r--   2 hadoopusr supergroup          0 2012-11-19 15:34 /user/hadoopusr/output/_SUCCESSdrwxr-xr-x   - hadoopusr supergroup          0 2012-11-19 15:34 /user/hadoopusr/output/_logs-rw-r--r--   2 hadoopusr supergroup         25 2012-11-19 15:34 /user/hadoopusr/output/part-r-00000

4. Enter the command bin/hadoop FS-CAT/user/hadoopusr/output/part-r-00000 to view the final statistics. The result is as follows:

hadoopusr@shan-pc:/usr/local/hadoop$ bin/hadoop fs -cat /user/hadoopusr/output/part-r-00000hadoop1hello2world1

Reprinted please indicate the source:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.