Prerequisite, need to have Linux environment
See Linux environment Preparation One, install Hadoop 1, upload Hadoop
The Hadoop I use is hadoop-2.4.1.tar.gz, uploading it to the user's directory and creating the app directory under the user directory for easy management. Extract Hadoop into this directory
2, Hadoop directory description
Access to the app can see the hadoop-2.4.1 directory, access to see
Bin: Executable directory
sbin: System executables
etc: Profile
Lib: libraries associated with the local platform, where the local platform is Linux
share: Core jar packages and documentation
Second, modify the configuration file
The configuration files are in the Hadoop folder in the ETC directory 1, JDK changes in hadoop-env.sh
This file does not need to be changed as a whole, but if your JDK is user-defined and not global, you will probably not get it, at which point you need to modify the JDK directory and write the specified JDK directory dead in the file.
2, Core-site.xml
Specifying the default File system
Specify file Storage root directory
Configuration label is empty by default, add the following
<configuration>
<property>
<name>fs.defaultFS</name>
//IP address can also be replaced
with host name <value>hdfs://192.168.49.31:9000/</value>
</property>
<property>
<name >hadoop.tmp.dir</name>
<value>/home/fangxin/app/hadoop-2.4.1/tmp/</value>
</ Property>
</configuration>
FS.DEFAULTFS represents the default file system
hdfs://192.168.49.31:9000 is the HDFs system, on 31 servers, listening on port 9000
HADOOP.TMP.DIR Specifies the file storage root directory where Hadoop creates the Dfs file directory, Namenode creates the Namenode folder, and Datanode creates the Datanode folder.
If this parameter is configured, then the TMP of Hadoop is used as the root directory, which is emptied after the restart of the directory. 3, hdfs-site.html
Replica Quantity configuration
<configuration>
<property>
<name>dfs.replication</name>
<value>1 </value>
</property>
</configuration>
Currently only one node, so backup 1 points 4, Mapred-site.xml
There are mapred-site.xml.tempate in the catalogue, change to Mapred-site.xml
MV Mapred-site.xml.template Mapred-site.xml
Set the default resource scheduling framework for MapReduce, I use yarn
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value >yarn</value>
</property>
</configuration>
5, Yarn-site.xml
The yarn framework is also clustered, with the master node and from the node, whose main node is called ResourceManager
Configuration
Primary Cluster Name
Intermediate data scheduling mechanism, here with mapreduce_shuffle mechanism
<configuration>
<!--Site specific YARN configuration Properties-->
<property>
< name>yarn.resourcemanager.hostname</name>
<value></value>
</property>
</configuration>
6, from the machine configuration slaves (cluster configuration items)
The host computer that will run from the machine is configured in the slaves file of the hosts. When you start Master, you will start the configuration from the machine.
If Secondarynamenode and Namenode are separated, you need to use the configuration file master, otherwise you do not need
HADOOP2
HADOOP3
Other Configuration
You can configure host to the server so that you can avoid using IP address three to start configuring Hadoop environment variables.
If you want to use the Hadoop command as a global variable, you first need to configure
Command
vi/etc/profile
//modify content
java_home=/usr/java/jdk1.7.0_79
export Hadoop_home=/home/fangxin/app /hadoop-2.4.1
Export path= $JAVA _home/bin: $PATH: $HADOOP _home/sbin: $HADOOP _home/bin
format Namenode
Command
Hadoop namenode-format
//formatting process will let you answer once, note to use uppercase Y
If successful, you can see the following information
If the error is correct, return to step two to check the configuration again.
After success, the file storage directory "TMP" that we previously configured in Core-site.xml is generated and has a DFS subdirectory.
Start HDFs
Sbin directory, execute command
start-dfs.sh
In turn, the Namenode, Datanode, Secondarynamenode, the period to confirm the identity of several times, as prompted to do.
After the boot is complete.
This command is in the JDK bin directory, Java process statistics
JSP
Command View process
You can also pass
Netstat-nltp
See how many ports each process listens to
For example, the namenode of process 25898 listens for 9000 ports. Some processes listen to more than one port because different traffic uses different ports. Start Yarn
start-yarn.sh
ResourceManager and NodeManager two processes can be seen after startup
Hadoop common command line
View File Directories
Hadoop fs-ls hdfs://192.168.49.31:9000/or Hadoop fs-ls/
Note: If you have the following error
Java HotSpot (TM) 64-bit Server VM warning:you have loaded library/home/fangxin/app/hadoop-2.4.1/lib/ native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM would try to fix the stack guard now.
It's probably Hadoop. The default compiled library is 32-bit. The interim measures are as follows
the following hadoop-env.sh
at the end add the following two lines of
export hadoop_common_lib_native_dir=${hadoop_prefix}/lib/ Native
export hadoop_opts= "$HADOOP _opts-djava.net.preferipv4stack=true-djava.library.path= $HADOOP _prefix/ Lib "
but there are still warnings:
17/02/08 09:03:25 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
Deposit in File
Hadoop fs-put file name hdfs://192.168.49.31:9000/
Files that are stored in Hadoop are located in the depths of the file storage directory tmp, in the Data/current/finalized folder, and are cut into multiple stores if they are greater than 128M. Remove file
Hadoop fs-get/filename
where/represents the root directory of Hadoop, if in multiple directories, search in sequence to create a table of contents
Create a directory command
Hadoop fs-mkdir/directory B
View Files
Hadoop fs-cat/File directory/file name
MapReduce Test RunCount the number of words
1, prepare a file, can be a text file, content can be some words, such as hello,good and so on
2, the preparation of the file uploaded to the Hadoop space
3, into the share folder under the MapReduce
4. Execute the WORDCOUNT program in jar package Hadoop-mapreduce-example
Hadoop jar Hadoop-mapreduce-examples-2.4.1.jar wordcount/input directory/output directory
If you follow the steps, you can count the contents of your text words
Calculate pi
Hadoop jar Hadoop-mapreduce-examples-2.4.1.jar Pi 10 10
Four, SSH remote login
Because the bottom of Hadoop is authenticated remotely through SSH, if we do not configure the SSH key, we need to continue to perform various remote operations through the input password, which is cumbersome. Even a single node of local Hadoop operations.
What is the ssh:secure shell (Secure Shell protocol). Log on to another Linux host from a Linux host.
See key authentication authorization mechanism for SSH Telnet
Linux SSH configuration steps:
1. Locally generated key pair
2. Remote SCP copy public key to server
3, service side, in the. SSH directory to create files
$ Touch Authorized_keys
4, append the public key information to the above file
Append information operation
$ cat ~/id_rsa.pub >authorized_keys
5, Authorized_keys file must be only the user can read and write, the other user groups are not authorized to take effect.
$ chmod Authorized_keys
6, remote login from the client, if not password, description of successful configuration
$ SSH Service End name
Imitate the above configuration, although pseudo-distributed only single node, you can configure the local SSH protocol, startup and shutdown Hadoop do not need a password