Hadoop:hadoop single-machine pseudo-distributed installation and configuration

Source: Internet
Author: User
Tags tmp file ssh server hdfs dfs

http://blog.csdn.net/pipisorry/article/details/51623195

Because the LZ Linux system has been installed a lot of development environment, the following steps may be omitted.

Previously, the Hadoop single-machine pseudo-distributed [Hadoop:hadoop single-machine pseudo-distributed installation and configuration ] was configured in Docker, and only the root user in Docker, all no permissions issues exist.

This is configured directly under Linux, mainly in order to be able to debug the Hadoop program with NetBeans IDE, and the user who is logged on at boot time is pika.

This tutorial configures the environment:

ubuntu14.04 (Ubuntu 12.04/32-bit, 64-bit ok! LZ is directly used in the dual system of Linux)

Hadoop 2.6.4 (native Hadoop 2 ok!)

Jdk1.7.0_101 (should 1.6+ should all ok!)

Phi Blog


Basic Environment Configuration
Installing and Configuring the Java environment

Download the corresponding version of the JDK installation package from the Oracle official website on the host

$ sudo vim/etc/ profile
The first line "..." adds: ${java_home}/bin, trailing export path

Path= "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"

Export java_home=/opt/jdk1.8.0_73
Export JRE_HOME=${JAVA_HOME}/JRE
Export Classpath=.:${java_home}/lib:${jre_home}/lib

Export PATH= $PATH: ${java_home}/bin

$sudo update-alternatives--install/usr/bin/java java/opt/jdk1.8.0_73/bin/java 300;

sudo update-alternatives--install/usr/bin/javac javac/opt/jdk1.8.0_/bin/javac 300;
sudo update-alternatives--install/usr/bin/javah javah/opt/jdk1.8.0_/bin/javah 300;
sudo update-alternatives--install/usr/bin/jar jar/opt/jdk1.8.0_/bin/jar
./etc/ Profile
Test whether the installation was successful
[Email protected]:/#java-version

[Java Environment Configuration: Installing JDK, Eclipse]

install SSH, configure SSH login without password

SSH is required for cluster, single-node mode (SSH must be installed and sshd must is running to take the Hadoop scripts that manage remote Hadoop daemons, which does not say be sure to SSH l Ocalhost login Ah!!! LZ Pro-test, no ssh localhost login can run Hadoop program, as long as the installation and operation of sshd can be, Ubuntu default installed SSH client, also need to install SSH server:
pika:~$sudo apt-get install-y openssh-server

Edit the configuration file for sshd pika:~$sudo vim/etc/ssh/sshd_config, which will be about 88 lines, usepam parameter set to "No"
Start the sshd service pika:~ $sudo/etc/init.d/ssh start

#每次重启都要重新启动, so join the profile to start

pika:~$sudo vim/etc/profile

Which adds a row/etc/init.d/ssh start


View SSH Service status pika:~ $ps-e | grep ssh
29856? 00:00:00 sshd

After installation, you can log on to this machine using the following command:ssh localhost
At this time there will be SSH first login prompt, enter no, (if input yes and then enter the password, so the login to the machine, but this is required to enter the password every time, we need to configure SSH without password landing more convenient, if you go in the first exit just SSH).
Go back to our original terminal window and use Ssh-keygen to generate the key and add the key to the authorization:
Exit # quit SSH localhost just now, or use Ctrl+d
CD ~/.ssh/ # If you do not have this directory, first execute SSH localhost once
ssh-keygen-t RSA # will be prompted, all press ENTER to be able, if already exist, directly into the next step can be (LZ's existence, indicating that it has been used before)
Cat./id_rsa.pub >>./authorized_keys # Add authorization to append id_rsa.pub to the authorized key

Finally exit SSH, Ctrl+d

About setting up Hadoop users (you can actually use them completely)

[Hadoop installation: SSH to localhost problem solving]

Phi Blog


installation and configuration of Hadoop

Hadoop 2 can be downloaded via http://mirror.bit.edu.cn/apache/hadoop/common/or http://mirrors.cnnic.cn/apache/hadoop/common/, Generally choose to download the latest stable version, that is, download "stable" under the hadoop-2.x.y.tar.gz format of the file, which is compiled, and another containing SRC is the Hadoop source code, need to compile to use. LZ download is hadoop-2.6.4.

Installing Hadoop into/usr/local/

Download Hadoop to/usr/local and unzip

pika:~$sudo wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz-P/usr/local

pika:~$cd/usr/local/

pika:/usr/local$sudo tar-zxf hadoop-2.6.4.tar.gz
pika:/usr/local$sudo ln-s/usr/local/hadoop-2.6.4/usr/local/hadoop#设置软链接, so two directories are the same.
Pika:/usr/local$ls/usr/local/hadoop
Bin etc include Lib Libexec LICENSE.txt NOTICE.txt README.txt sbin Share

sudo chown-r Hadoop./hadoop #-R recursively modifies file owners for Hadoop

Check if Hadoop is available

pika:/usr/local$Hadoop/bin/hadoop Version
Hadoop 2.6.4
Subversion Https://git-wip-us.apache.org/repos/asf/hadoop.git-r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by Jenkins on 2016-02-12t09:45z
Compiled with Protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command is run Using/usr/local/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

Phi Blog



Hadoop single-machine pseudo-distributed configuration

{As mentioned above, the following installation and operation is actually useless to ssh localhost login, can also run! If you really want to SSH login only for start-dfs.sh execution before the login is possible}

Hadoop can run in a pseudo-distributed manner on a single node, and the Hadoop process runs as a separate Java process, with nodes as both NameNode and DataNode, while reading the files in HDFS.
The configuration file for Hadoop is located in/usr/local/hadoop/etc/hadoop/, and pseudo-distributed requires the modification of 2 configuration files Core-site.xml and Hdfs-site.xml. The configuration file for Hadoop is in XML format, and each configuration is implemented in a way that declares the property's name and value.

Hadoop execution Command directory joins to Path

pika:~$sudo vim/etc/ profile

Path= "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
path= "$PATH: ${java_home}/bin:${spark_home}/bin:${hadoop_home}/bin:${hadoop_home}/sbin"
Export PATH
Export java_home=/opt/jdk1.8.0_73
Export JRE_HOME=${JAVA_HOME}/JRE

Export classpath=.:${java_home}/lib:${jre_home}/

Export classpath= $CLASSPATH:/usr/local/hadoop-2.6.4/etc/hadoop:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/ *:/usr/local/hadoop-2.6.4/share/hadoop/common/*:/usr/local/hadoop-2.6.4/share/hadoop/hdfs:/usr/local/ Hadoop-2.6.4/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.6.4/share/hadoop/hdfs/*:/usr/local/hadoop-2.6.4/share /hadoop/yarn/lib/*:/usr/local/hadoop-2.6.4/share/hadoop/yarn/*:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/ lib/*:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/*:/usr/local/hadoop-2.6.4/contrib/capacity-scheduler/*. Jar

Export hadoop_home=/usr/local/hadoop-2.6.4
Export Spark_home=/opt/spark

Export Pyspark_python=python3

Note:hadoop_home must use the actual directory and cannot use soft links! , LZ before the configuration of Spark, there is a spark configuration, no tube, mainly add color of the part added on the OK.

pika:~$ ./etc/ profile #下次重启后不用

pika:~ $echo $PATH

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/jdk1.8.0_73/bin :/opt/spark/bin:/usr/local/hadoop-2.6.4/bin:/usr/local/hadoop-2.6.4/sbin

Change the configuration file Core-site.xml and Hdfs-site.xml

pika:~ $CD/usr/local/hadoop

pika:/usr/local/hadoop$sudo vim etc/hadoop/core-site.xml

Change the <configuration></configuration> in:

<configuration>
        < Property>
             <name> Hadoop.tmp.dir</name>
             < Value>file:/usr/local/hadoop/tmp</value>
              <description>abase for other temporary directories.</description>
         </property>
        <property>
             <name>fs.defaultfs</name
             <value>hdfs://localhost : 9000</value>
        </property>
</ Configuration>

Note: I go, set the vim, vim paste will automatically indent, so you can first in the gedit to delete the preceding space: ^\s+.

Also modify the configuration file Hdfs-site.xml:

pika:/usr/local/hadoop$sudo vim etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>

Hadoop configuration file Description

Hadoop runs in a configuration file (the configuration file is read when running Hadoop), so if you need to switch back from pseudo-distributed mode to non-distributed mode, you need to remove the configuration items from the Core-site.xml.
In addition, pseudo-distributed only needs to be configured Fs.defaultfs and dfs.replication can be run, but if not configured Hadoop.tmp.dir parameters, the default is to use a temporary directory of/tmp/hadoo-hadoop, This directory may be removed by the system when it restarts, causing the format to be re-executed.

Also specify Dfs.namenode.name.dir and Dfs.datanode.data.dir, otherwise errors may occur in the next steps.

After the configuration is complete, perform NameNode formatting

Configuring Java_home in Hadoop

pika:~$sudo vim/usr/local/hadoop-2.6.4/libexec/hadoop-config.sh

Add export java_home=/opt/jdk1.8.0_73 at approximately 161 lines

$CD/usr/local/hadoop

pika:/usr/local/hadoop$sudo bin/hdfs namenode-format
Successful, you will see "successfully formatted" and "Exitting with status 0" prompt, if "Exitting with status 1" is an error.

Turn on the NameNode and DataNode daemons

Open the command to the Hadoop installation directory for the file, or error starting namenodes on [localhost] localhost:mkdir:cannot create directory '/usr/soft/ Hadoop-2.6.3/logs ': Permission denied

pika:~$sudo chown-r pika:pika/usr/local/hadoop-2.6.4
pika:~$sudo chmod u+w-r/usr/local/hadoop-2.6.4

Note:lz changes the ownership of the Hadoop directory to the current user Pika and user group Pika, and gives the directory recursion plus the current user Pika Write permission. Of course, you can also directly to all users with W permission: $sudo chmod a+w-r/usr/local/hadoop-2.6.4/, but LZ think this may be a security risk.

pika:~$start-dfs.sh
Note: If an SSH prompt appears, enter Yes

Note: The results above do not necessarily mean that Hadoop is running correctly, and should be viewed through the JPS below.

If you run the above start-dfs.sh error, running again can be an error, generally because there is no stop-related program. So it's going to be implemented through stop-dfs.sh.


View the started Hadoop process through JPS

After the boot completes, the command JPS can be used to determine whether the startup is successful, and if successful, the following processes are listed: "NameNode", "DataNode", and "Secondarynamenode" (if Secondarynamenode does not start, please run Sbin/stop-dfs.sh close the process, and then try to start the attempt again. If there is no NameNode or DataNode, that is, the configuration is unsuccessful, please double-check the previous steps, or check the startup log for troubleshooting reasons.

pika:~$JPS


Access through the Web interface

In the case of Hadoop configured for Linux or virtual machines, after successful startup, you can access the Web interface http://localhost:50070 view NameNode and Datanode information, and view the files in HDFS online.

Possible errors and solutions for starting Hadoop

Workarounds for Hadoop not starting properly
You can generally view the boot log to troubleshoot the cause, and note the points:
The start-up will prompt the shape as "dblab-xmu:starting Namenode, logging To/usr/local/hadoop/logs/hadoop-hadoop-namenode-dblab-xmu.out", where DBLAB-XMU corresponds to your machine name, but in fact the boot log information is recorded in the/usr/local/hadoop/logs/hadoop-hadoop-namenode-dblab-xmu.log, so you should look at this file suffix. log;
Each time the boot log is appended to the log file, so you have to pull to the last side to see, compared to the recorded time to know.
The general error hints are in the last, usually written Fatal, error, Warning, or Java Exception places.
You can search the Internet for error messages to see if you can find some relevant solutions.

Such as:

FATAL org.apache.hadoop.hdfs.server.namenode.NameNode:Failed to start Namenode.

org.apache.hadoop.hdfs.server.common.inconsistentfsstateexception:directory/usr/local/hadoop-2.6.4/tmp/dfs/ Name is in a inconsistent state:storage directory does not exist or are not accessible.

This error is generally because there is no configuration like the sudo vim etc/hadoop/hdfs-site.xml, or there is no permission to modify the current user's write directory as above.

Error 1:

The following WARN prompt may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable. The WARN hint can be ignored and does not affect normal use (the WARN can be

Compile Hadoop source code solution).

Error 2:

Error: Starting namenodes on [localhost]
Localhost:Error:JAVA_HOME is isn't set and could not being found.

Clearly already set up, and $ echo $JAVA _home
/opt/jdk1.8.0_91 Successful Output!

Solution: pika:~ $sudo vim/usr/local/hadoop-2.6.4/libexec/hadoop-config.sh, specifically as above.

Error 3:

Prompt Could not resolve hostname when starting Hadoop
If you start Hadoop with very much output "ssh:could not resolve hostname xxx" exception condition
This is not an SSH issue and can be resolved by setting the HADOOP environment variable. Start by pressing the CTRL + C interrupt on the keyboard, and then in ~/.BASHRC, add the following two lines (the setup process is the same as the java_home variable, where hadoop_home is the installation directory for HADOOP):
Export Hadoop_home=/usr/local/hadoop
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Execute the source ~/.BASHRC to make the variable settings effective, and then execute again./sbin/start-dfs.sh start Hadoop.

Error 4:datanode not started

In general, if the DataNode does not start, you can try the following method (note that this will delete all the data in HDFS, if the original data is important please do not):
./sbin/stop-dfs.sh # Close
Rm-r. tmp Delete the TMP file, note that this removes all data from HDFS
./bin/hdfs Namenode-format Reformatting Namenode
./sbin/start-dfs.sh # Restart

Phi Blog



Hadoop single-machine pseudo-distributed running instance

The single-machine mode grep example reads local data, and the pseudo-distributed read is the data on HDFS. Here we also write our own Java code and introduce Hadoop packages in Java code.

If you are a user created by yourself and do not have a user directory, to use HDFS, you first need to create a user directory in HDFs:
./bin/hdfs Dfs-mkdir-p/user/hadoop
The files are then copied to the Distributed file system.

we are using a Hadoop user and have created the appropriate user directory/user/hadoop, so you can use a relative path such as input in the command, and its corresponding absolute path is/user/hadoop/input
Add Hadoop classpath

This step has already been configured in the PATH environment configuration above.

Pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$hadoop Classpath

/usr/local/hadoop-2.6.4/etc/hadoop:/usr/local/hadoop-2.6.4/share/hadoop/common/lib/*:/usr/local/hadoop-2.6.4/ share/hadoop/common/*:/usr/local/hadoop-2.6.4/share/hadoop/hdfs:/usr/local/hadoop-2.6.4/share/hadoop/hdfs/lib/ *:/usr/local/hadoop-2.6.4/share/hadoop/hdfs/*:/usr/local/hadoop-2.6.4/share/hadoop/yarn/lib/*:/usr/local/ hadoop-2.6.4/share/hadoop/yarn/*:/usr/local/hadoop-2.6.4/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.6.4/ Share/hadoop/mapreduce/*:/usr/local/hadoop-2.6.4/contrib/capacity-scheduler/*.jar

Note: To add the Hadoop CLASSPATH output to CLASSPATH, Hadoop provides a convenient utility to get the CLASSPATH information. Run "Hadoop classpath" This should give your need set your classpath for compiling your code. Otherwise Java code I Mport Package org.apache.hadoop.* Error:javac. java error:package org.apache.hadoop.conf does not exist... [Package ORG.APACHE.HADOOP.FS does not exist]

#启动hadoop并将input文件copy到hdfs中

pika:~ $hdfs Dfs-ls/
pika:~$HDFs dfs-mkdir-p/pika/input
pika:~$HDFs Dfs-ls/

Drwxr-xr-x-pika supergroup 0 2016-06-10 19:22/pika

Copy the input files to the Hadoop HDFs file system for use (. java files can be used without copy)

pika:~$HDFs dfs-put/media/pika/files/mine/java_workspace/bdms/src/hw2/*input*/pika/input

pika:~ $hdfs Dfs-ls/pika/input

-rw-r--r--1 pika supergroup 165 2016-06-10 19:28/pika/input/example-input.txt
Drwxr-xr-x-pika supergroup 0 2016-06-10 19:28/pika/input/part1-input

Note: What files in the/media/pika/files/mine/java_workspace/bdms/src/hw2/directory can be downloaded here [avg-time Hadoop program]

#编译执行java的hadoop程序

pika:~$cd/media/pika/files/mine/java_workspace/bdms/src/hw2/
pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$rm-f *.class *.jar#移除已有的java编译文件

Pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$ Javac Avgtime.java
Pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$jar CFM Avgtime.jar avgtime-manifest.txt avgtime*.class
pika:/media/pika/files/mine/java_workspace/bdms/src/ Hw2$hdfs dfs-rm-f-r/pika/output               #移除已有的output文件夹目录

pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$Hadoop jar./avgtime.jar/pika/input/ Example-input.txt/pika/output #执行hadoop程序
pika:/media/pika/files/mine/java_workspace/bdms/src/hw2$hdfs dfs-cat '/pika/output/part-* ' #查看输出
1.2.3.4 18811001100 2 28.500
Alpha 1.2.3.4 2 20.200
Beta Alpha 2 4.100

Note: The output directory cannot exist while Hadoop is running the program, otherwise it will prompt the error "Org.apache.hadoop.mapred.FileAlreadyExistsException:Output DirectoryHdfs://localhost:9000/user/hadoop/output already exists ", so to execute again, you need to execute the following command to delete the output folder:./bin/hdfs dfs-rm-r Output

When running a Hadoop program, the output directory specified by the program (such as output) cannot be present to prevent overwriting the result, otherwise an error is prompted, so the output directory needs to be deleted before running. When you actually develop your application, consider adding the following code to your program to automatically delete the output directory each time you run it, avoiding tedious command-line operations:
Configuration conf = new configuration ();
Job Job = new Job (conf);
/* Delete Output directory */
Path OutputPath = new Path (args[1]);
Outputpath.getfilesystem (conf). Delete (OutputPath, true);

from:http://blog.csdn.net/pipisorry/article/details/51623195

Ref


Hadoop:hadoop single-machine pseudo-distributed installation and configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.