Hadoop Installation and Considerations

Last Update:2016-03-08 Source: Internet

Author: User

Tags glob hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Hadoop installation and Considerations
1. To install the Hadoop environment, you must have a Java environment in your system.
2. SSH must be installed, and some systems will be installed by default, if not installed manually.
Can be installed with Yum install-y ssh or RPM-IVH ssh rpm package

Two. Install and configure the Java environment
Hadoop needs to run in a Java environment and requires the installation of a JDK.
1. Download the JDK on the official web site: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
A. Go to select the appropriate RPM package or tar package to install. I am here to download the RPM package, because it is more convenient. It is possible to use the RPM package without the configuration of the environment variable.
# rpm-ivh/usr/java/jdk1.8.0_60.rpm
B. Check that the Java environment is installed successfully, and hit the following command:
# Java-version Displays the corresponding version number

# Javac Javac the appropriate information

# java Java corresponding information

If the above print out, it means success.

Three. Download and install Hadoop
1. Go to Hadoop's website to download the corresponding Hadoop version. Address: http://hadoop.apache.org/releases.html
A. Download the appropriate tar package

B. Carry out the tar unpacking
# tar-ivh/usr/local/hadoop/hadoop-2.7.1.tar.gz
C. Modify the corresponding configuration file information, make the corresponding java_home
#vi/usr/local/hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
# set to the root of your Java installation
Export Java_home=/usr/java/latest #显示当前jdk安装的目录 General RPM is installed in the USR directory
D. Configuring environment variables for Hadoop (to add Hadoop commands to path, you can use the relevant commands for Hadoop)
1. Edit the/etc/profile file and add the following code after the file:
hadoop_home=/usr/local/hadoop/hadoop-2.7.1
Path= $HADOOP _home\bin: $PATH
Export Hadoop_home PATH
2. Making the modified file effective
Source/etc/profile
This allows you to enter the installation directory of Hadoop to carry out the relevant command operation!

Three. Execute the relevant commands
1. Run a MapReduce job locally:
Enter the installation directory for Hadoop: $ cd/usr/local/hadoop/hadoop-2.7.1/
One: Format file system $ Bin/hdfs Namenode-format
Two: Start a namenode background process and DataNode background process.
$./sbin/start-dfs.sh
The log files for the Hadoop background process are output to the logs file under the installation directory file.
Third: Access to the site can be viewed by the corresponding Namenode
namenode-http://localhost:50070/
IV: Execute the MapReduce Job, you must create the HDFs folder
$ Bin/hdfs DFS-MKDIR/USR
$ Bin/hdfs dfs-mkdir/usr/<username>
V: Copy input file to Distributed File system
$ Bin/hdfs dfs-put Etc/hadoop input
Six: The corresponding example of the operation provided
$ bin/hadoop jar Share/hadoop/mapreduce/hadoop-maegrop './bj-getoutpreduce-examples-2.7.1.jar grep input Output ' dfs[ A-Z.] +
Seven: Check the output file: Copy the output file from the Distributed file system to local, and test.
$ bin/hdfs dfs-get Output output
$ cat output/*
or view the output file in the Distributed File system
$ Bin/hdfs Dfs-cat output/*
Eight: Stop background process
$ sbin/stop-dfs.sh

Four. Commands related to Hadoop
All Hadoop commands are called through the Bin/hadoop script, and the Hadoop script runs without any parameters printed as described for all commands.
1.usage:hadoop [--config confdir] [--loglevel loglevel] [COMMAND] [generic_options] [command_options], these options are optional.
A.--config Confdir: Overrides the default configuration directory. Default is ${hadoop_home}/conf
B.--loglevel loglevel: Overwrites the log level. Log levels are: FATAL, ERROR, WARN, info, DEBUG, and TRACE, which is the default info level.
C.generic_options: Common option for multi-command support.
D.command_options: The options for various commands are in the document that describe the common sub-project of Hadoop, and HDFs and yarn are described in other documents.
2. Common operation
A. You can use multiple operations commands in conjunction to configure the corresponding Hadoop
1.-archives <comma separated list of Archives>: Specifies a comma-delimited document for job only.
2.-conf <configuration file>: Specifies a profile for an app.
3.-d <property>=<value>: Gets the value in the properties file
4.-files <comma separated list of Files>: Specifies that comma-delimited files are replicated in the map reduce cluster, only for job.
5.-JT <local> or &LT;RESOURCEMANAGER:PORT&GT;: Specifies a ResourceManager. Only available for job.
6.-libjars <comma seperated List of Jars>: Specifies a comma-delimited jar file, contained in Classpath, that is applicable only to the job.

Five. Common commands for Hadoop
All Hadoop commands are executed via the Hadoop shell command, including user Commands and Admininistration Commands.
1.User Commands: Use caution in the case of Hadoop clusters.
A.archive: Create a Hadoop archive,
B.checknative:usage:hadoop checknative [-A] [-h]
-A: Select all available Packages
-H: Printing Help information
C.classpath:usage:hadoop classpath [--glob |--jar <path> |-h |--help]
--glob: Wildcard characters
--jar <path>:write classpath as manifest in jar named path
-H 、--Help: printing assistance information
D.credential:usage:hadoop credential <subcommand> [Options]
1.create alias [-provider Provider-path]:
Prompts the user for a credential to be stored as the given alias. The Hadoop.security.credential.provider.path within the Core-site.xml file would be a used unless A-provider is indicated.
2.delete alias [-provider Provider-path] [-f]
Deletes the credential with the provided alias. The Hadoop.security.credential.provider.path within the Core-site.xml file would be a used unless A-provider is indicated. The command asks for confirmation unless-f is specified
3.list [-provider Provider-path]
Lists all of the credential aliases the Hadoop.security.credential.provider.path within the Core-site.xml file would be use D unless A-provider is indicated.
E.classname:usage:hadoop CLASSNAME
Run a class with the class name classname
F.version:usage:hadoop version
Print version information for Hadoop
G.trace: View and modify Hadoop tracing settings. You can see the appropriate official documentation.
H.key: Manage Keys.
I.jar:usage:hadoop jar <jar> [MainClass] args ...
Run a jar file.
Apply yarn jar to run yarn application.
J.fs: You can view the appropriate official documentation.
K.DISTCP: Copy files or directories to view the appropriate official documents.
2.Administration Commands: Use with extreme caution in the case of Hadoop clusters
Background process log:
A.daemonlog:usage:
Hadoop daemonlog-getlevel Hadoop daemonlog-setlevel 1.-getlevel host:httpport ClassName:
Prints The log level of the log identified by a qualified classname, and the daemon running at Host:httpport. This command internally connects to Http://2.-setlevel Host:httpport classname Level
Sets the log level of the log identified by a qualified classname, and the daemon running at Host:httpport. This command internally connects to Http://The background process Gets or sets the log level to the corresponding class.

Hadoop Installation and Considerations

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More