Objective
This article describes how to build a Hadoop platform on the Ubuntu Kylin operating system.
Configuration
1. Operating system: Ubuntu Kylin 14.04
2. Programming language support: JDK 1.8
3. Communication protocol Support: SSH
2. Cloud computing Project: Hadoop 1.2.1
Step One: Install the latest version of the JDK (ignore this step if you have already installed)
1. Go to the official website to download JDK1.8 and unzip (current installation package: jdk-8u25-linux-x64.gz)
2. Copy the extracted installation package to the/USR/LIB/JVM directory (the JVM directory needs to be created by itself)
3. Open the/etc/profile file as an administrator and add the following code at the bottom of the file:
1 #set Java Environment 2 Export JAVA_HOME=/USR/LIB/JVM/JDK1. 8 . 0_25 3 export classpath=".: $JAVA _home/lib: $CLASSPATH"4 export path=" $JAVA _home/bin: $PATH"
4. Execute the following command to make the configuration file effective immediately:
1 source/etc/profile
5. Verify that the JDK is successfully installed by executing the following command:
1 java-version
The following information is displayed to indicate that the installation is complete:
Step Two: Configure SSH password-free login
1. Enter the following command to install SSH
Ssh
2. Check whether there is a. SSH hidden folder in the user directory, and create one yourself without the words.
3. Execute the following command to configure SSH login without password (the functions of these lines of code refer to the SSH documentation):
1 Ssh-keygen ' -f ~/. ssh/id_dsa2cat ~/. ssh/id_dsa.pub >> ~/. ssh/authorized_keys
4. Execute the following command to verify that the SSH installation configuration is successful:
1 ssh localhost
When prompted for Yes, the terminal displays the following information indicating that the SSH configuration was successful:
Step three: Install and run Hadoop
Description: Hadoop has three modes of operation-single-machine mode, pseudo-distributed, and fully distributed. The first two are mainly used for program testing and debugging, here is to talk about the pseudo-distributed configuration, the configuration of a fully distributed method will be explained later.
1. Download and unzip the latest version of Hadoop into the current directory (the current installation package is: hadoop-1.2.1.tar.gz)
2. Go to the Conf subdirectory and modify the following configuration file:
A. hadoop-env.sh
Set the Java path at the end:
1 export JAVA_HOME=/USR/LIB/JVM/JDK1. 8. 0_25
B. core-site.xml
Configured to:
1<?xml version="1.0"?>2<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>3 4<!--Put Site-specific property overridesinchThisfile. -5 6<configuration>7<property>8<name>fs.default.name</name>9<value>hdfs://localhost:9000</value>Ten</property> One</configuration>
C. hdfs-site.xml
Configured to:
1<?xml version="1.0"?>2<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>3 4<!--Put Site-specific property overridesinchThisfile. -5 6<configuration>7<property>8<name>dfs.replication</name>9<value>1</value>Ten</property> One</configuration>
D. mapred-site.xml
Configured to:
1<?xml version="1.0"?>2<?xml-stylesheet type="text/xsl"href="configuration.xsl"?>3 4<!--Put Site-specific property overridesinchThisfile. -5 6<configuration>7<property>8<name>mapred.job.tracker</name>9<value>localhost:9001</value>Ten</property> One</configuration>
3. Go to the Hadoop folder and execute the following command to format the Hadoop file system HDFs:
1 bin/hadoop Namenode-format
4. Execute the following command to start all Hadoop processes:
1 bin/start-all. SH
5. Verify that Hadoop is installed successfully
A. Open the browser and enter the URL http://localhost:50030 to view the MapReduce Web page:
B. Open the browser and enter the URL http://localhost:50070 to view the HDFs Web page:
If the display is OK, then the Hadoop environment is set up.
Summary
1. Pseudo-distributed architectures, mechanisms and real-world distribution are actually the same, but in pseudo-distributed, both master and slave are a single machine.
2. The construction of a real-world distributed environment will be introduced in the future. A virtual network will be formed on the virtual machine to run the real distributed program.
The construction of Hadoop on Ubuntu systems [illustration]