Hadoop has been built several times, complete summary (tired baby, built more than 10 times)

Source: Internet
Author: User
Tags builtin shuffle filezilla openssh server hadoop fs

1. Installing the JDK
1.1 Uploads
Using the software FileZilla, place the JDK package on Windows into the root directory of Linux

1.2 Unpacking the JDK
#创建文件夹
Mkdir/usr/java (do not hang under the "/" system disk)
#解压
TAR-ZXVF jdk-7u55-linux-i586.tar.gz-c/usr/java/(Create a Java folder in advance in the/usr/directory)

1.3 Adding Java to an environment variable
Vim/etc/profile
#在文件最后添加
Export java_home=/usr/java/jdk1.7.0_55
Export path= $PATH: $JAVA _home/bin

#刷新配置
Source/etc/profile

2. Install Hadoop
2.1 Uploading a Hadoop installation package
Using the software FileZilla, put the compressed package on Windows into the root directory of Linux

2.2 Extracting the Hadoop installation package
Under the root directory
Mkdir/cloud
#解压到 the/cloud/directory
TAR-ZXVF hadoop-2.2.0.tar.gz-c/cloud/

2.3 Modifying configuration files (5) Directories/cloud/hadoop-2.2.0/etc/hadoop
First one: hadoop-env.sh
Export java_home=/usr/java/jdk1.6

The second one: Core-site.xml
<configuration>
<!--Specify the communication address of the HDFs boss (Namenode)--
<property>
<name>fs.defaultFS</name>
<value>hdfs://northbigpenguin:9000</value>
</property>
<!--Specify the storage path of the file that is generated by the Hadoop runtime--
<property>
<name>hadoop.tmp.dir</name>
<value>/cloud/hadoop-2.2.0/tmp</value>
</property>
</configuration>

The third one: Hdfs-site.xml
<configuration>
<!--set the number of HDFs replicas-
<property>
<name>dfs.replication</name>
<value>1</value><!--default is 3, which is changed to one--
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hdfs/data</value>
</property>
</configuration>


Fourth: Mapred-site.xml.template needs to be renamed: MV Mapred-site.xml.template Mapred-site.xml
<configuration>
<!--notification framework Mr Using yarn--
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--this must be loaded--
<property>
<name>mapred.job.tracker</name>
<value>northbigpenguin:9001</value>
</property>
</configuration>

Fifth one: Yarn-site.xml

<configuration>
<!--reducer data is Mapreduce_shuffle
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>northbigpenguin</value>
</property>
</configuration>

3. Add Hadoop to environment variables
Vim/etc/profile
Export java_home=/usr/java/jdk1.6
Export hadoop_home=/root/cloud/hadoop-2.2.0
Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin
Deploy after completion to take effect
Source/etc/profile

4. Format HDFs (Namenode) to be formatted the first time it is used
#过时但还可以使用: Hadoop Namenode-format
Catalogue/cloud/hadoop-2.2.0
HDFs Namenode-format Use this (note: This sentence is executed only once in the entire configuration, otherwise there will be a lot of errors)
Attention:
(1) If error:
-bash:hadoop:command not found
Description
The path configuration for Hadoop is incorrect. View Vim/etc/profile's export hadoop_home=/root/cloud/hadoop-2.2.0 (using absolute path)
Do not know the absolute path:
Enter/hadoop-2.2.0/bin and enter PWD

5. Start Hadoop
5.1 Initializing HDFS (format file system)
(1) Find command:
Which Hadoop
Which HDFs
(2) Under the bin directory relative to HDFs:
Cd/root/download/hadoop/hadoop-2.2.0/bin
(3) Return to the directory where/hadoop-2.2.0 is located
HDFs Namenode-format (Hadoop namenode-format (obsolete, but available))

5.2 Starting Hadoop
(1) Enter the catalogue/root/cloud/hadoop-2.2.0/sbin

(2) Start HDFs First (you can use .../sbin/start-all.sh but you need to enter the password multiple times)

(3) The second method of starting up:
Start HDFs First
.. /sbin
./start-dfs.sh

Start yarn Again
.. /sbin
./start-yarn.sh
Resource unreachable and Stop errors are all restarted

/data/hdfs/data/current

(2) Error: Error:cannot Find configuration directory:
Enter Vim/etc/profile
We're going to configure all the paths.
Export java_home=/usr/java/jdk1.6
Export hadoop_home=/root/cloud/hadoop-2.2.0
Export Hadoop_common_home= $HADOOP _home
Export Hadoop_hdfs_home= $HADOOP _home
Export Hadoop_mapred_home= $HADOOP _home
Export Hadoop_yarn_home= $HADOOP _home
Export hadoop_conf_dir= $HADOOP _home/etc/hadoop
Export path= $PATH:: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _hoome/sbin: $HADOOP _home/lib
Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib"
Configure it again.
Source/etc/profile
6. Then shut down the server
./stop-all.sh
Restart the service:
./start-all.sh

Finally appears:
Starting Yarn Daemons
Starting ResourceManager, logging to/root/cloud/hadoop-2.2.0/logs/yarn-root-resourcemanager-northbigpenguin.out
Localhost:starting NodeManager, logging to/root/cloud/hadoop-2.2.0/logs/yarn-root-nodemanager-northbigpenguin.out
This is the word for configuration success.

7. Verify that the startup is successful
(1) Successful data transfer verification:
Using the JPS command to verify
Catalogue/root/cloud/hadoop-2.2.0/sbin
Input below:
JPs
The following six files appear for a configuration success:
27408 NameNode
28218 Jps
27643 Secondarynamenode
28066 NodeManager
27803 ResourceManager
27512 DataNode

Execute JPS on the master node and see 3 processes, namely Namenode, Secondarynamenode, Jobtracker
When executing JPS from a node, see 2 processes, namely Datanode, Tasktracker
(2) Web testing:
Add a mapping of the Linux hostname and IP to this file (under native Windows system)
C:\Windows\System32\drivers\etc\hosts
192.168.1.110 (IP address of Linux) localhost (Linux host name)
Access: northbigpenguin:50070 (HDFS Web page)
northbigpenguin:8088 (Mr Management interface)

Web tests do not show web pages need to shut down the firewall
Shutting down the firewall
#查看防火墙状态
Service Iptables Status
#关闭防火墙
Service Iptables Stop
#查看防火墙开机启动状态
Chkconfig iptables--list
#关闭防火墙开机启动
Chkconfig iptables off

Attention:
Page access:
Live Nodes-->browse The filesystem needs to configure a local connection or it will not be accessible
http://northbigpenguin:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=localhostip:9000
Click to jump to this interface

Page Data testing:
Upload files to Hadoop (the files are temporary, the server shuts down, the files disappear):
Hadoop Fs-put/root/download/jdk-6u45-linux-x64.bin HDFS://NORTHBIGPENGUIN:9000/JDK
Then the interface can see:
http://northbigpenguin:50075/browseDirectory.jsp
The command line downloads the uploaded file:
Hadoop Fs-get/root/download/jdk-6u45-linux-x64.bin HDFS://NORTHBIGPENGUIN:9000/JDK
Running an instance
(1) First establish two input files on local disk FILE01 and FILE02
$echo "Hello World Bye" > File01
$echo "Hello Hadoop Goodbye Hadoop" > File02

(2) Create an input directory in HDFs: $hadoop fs-mkdir input
(3) Copy the FILE01 and file02 into HDFs:
$hadoop fs-copyfromlocal/home/zhongping/file0* Input
(4) Execution WordCount:
$hadoop jar Hadoop-0.20.2-examples.jar wordcount Input Output
(5) When you are finished, review the results
$hadoop Fs-cat output/part-r-00000
NameNode
Jps
Secondarynamenode
NodeManager
ResourceManager
Datanode Relations and linkages


Yarn is responsible for the scheduling of resources
ResourceManager (responsible for management) is the upper level of yarn
NodeManager (responsible for work) (can be one, multiple at cluster time) is the low level of yarn
Secondarynamenode is hadoop1.0 solution for ha (high reliability)
Secondarynamenode is not a hot spare for namenode (namenode bad secondarynamenode not executed)
Secondarynamenode is an assistant to Namenode (complete data synchronization, but not in real time)

Hdfs
Namenode is the upper level of HDFs
DataNode (responsible for storing data) is the next level of HDFs


Error:
WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where
Applicable
System Version not compatible

What is the cluster environment for deploying Hadoop? Operating system CentOS 5.8??? is the Hadoop version Cloudera?? Hadoop-0.20.2-cdh3u3
After the settings in the cluster support gzip Lzo compression, the local library of Hadoop is used when the compressed file is read or the input file is compressed, the default location of the local library
$HADOOP _home/lib/native/linux-amd64-64?? (64-bit operating system)
$HADOOP _home/lib/native/linux-i386-32?? (32-bit operating system)
The libhadoop.so file in the folder is the local repository for Hadoop.
If the local library does not exist, or if the local library is inconsistent with the current version of the operating system, the following error is reported:
11/09/20 17:29:49 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java
Classes where applicable
Add debug Information settings
(Open: Export hadoop_root_logger=debug,console
Closed: Export Hadoop_root_logger=info,console)
$ Export Hadoop_root_logger=debug,console
$ Hadoop fs-text/test/data/origz/access.log.gz
2012-04-24 15:55:43,269 WARN org.apache.hadoop.util.NativeCodeLoader:Unable to load Native-hadoop library for your
Platform. Using Builtin-java classes where applicable error libhadoop.so? /lib64/libc.so.6 required (libc 2.6)???
/usr/local/hadoop/lib/native/linux-amd64-64?
Indicates that the version of GLIBC in the system and the version inconsistencies required by libhadoop.so are causing
View the libc version of the system?
# ll/lib64/libc.so.6
lrwxrwxrwx 1 root root one Apr 16:49/lib64/libc.so.6-libc-2.5.so
The version in the system is 2.5 to upgrade the GLIBC in the system to 2.9
Download glibc wget? http://ftp.gnu.org/gnu/glibc/glibc-2.9.tar.bz2
Download Glibc-linuxthreads wget http://ftp.gnu.org/gnu/glibc/glibc-linuxthreads-2.5.tar.bz2
Unzip $tar-JXVF glibc-2.9.tar.bz2 $CD glibc-2.9
$tar-JXVF. /glibc-linuxthreads-2.5.tar.bz2
$CD: $export cflags= "-g-o2"
$./glibc-2.7/configure--prefix=/usr--disable-profile--enable-add-ons--with-headers=/usr/include
--with-binutils=/usr/bin
$make
#make Install
There are three points to note when installing the build process:
1, to extract the glibc-linuxthreads into the glibc directory.
2, cannot run configure in glibc current directory.
3, plus optimization switch, export cflags= "-g-o2", otherwise there will be errors
After installation, you can view ls-l/lib/libc.so.6 upgraded
lrwxrwxrwx 1 root root one Apr 16:49/lib64/libc.so.6-libc-2.9.so
Test if the local library is upgraded
$ Export Hadoop_root_logger=debug,console
$ Hadoop fs-text/test/data/origz/access.log.gz
12/04/25 08:54:47 INFO Lzo. Lzocodec:successfully Loaded & initialized Native-lzo library [Hadoop-lzo Rev
6BB1B7F8B9044D8DF9B4D2B6641DB7658AAB3CF8]
12/04/25 08:54:47 DEBUG util. Nativecodeloader:trying to load the custom-built Native-hadoop library ...
12/04/25 08:54:47 INFO util. nativecodeloader:loaded the Native-hadoop library 12/04/25 08:54:47 INFO zlib. Zlibfactory:
Successfully loaded & initialized Native-zlib Library
12/04/25 08:54:47 DEBUG fs. Fsinputchecker:dfsclient Readchunk got seqno 0 Offsetinblock 0 Lastpacketinblock false
Packetlen 132100
You can see that the GLIBC upgrade is no longer an error, the local library has been successfully loaded

function Exercise One:
Download File command
Hadoop Fs-get hdfs://northbigpenguin:9000/jdk1.6/root/jdk1.6
Upload File command
Hadoop fs-put Jdk-6u45-linux-x64.tar hdfs://northbigpenguin:9000/jdk1.6

Uploading a file needs to be a RAR file, without creating a
(1) Download a JDK. No matter what compression package, window is extracted locally, packaged into a. zip file
(2) Upload the file to the Linux system
(3) in-system decompression
Unzip Jdk-6u45-linux-x64.zip
(4) Generate folder jdk-6u45-linux-x64
(5) Compress the folder into tar
TAR-CVF Jdk-6u45-linux-x64.tar jdk-6u45-linux-x64
(6) Upload to Hadoop on Linux
(7) Download file
(8) Unzip in Linux
TAR-XZVF jdk1.6

It's not good, so use
TAR-XVF jdk1.6 (Error with this)


Function Exercise II:
Upload file to count the number of words
Enter directory:
/root/cloud/hadoop-2.2.0/share/hadoop/mapreduce
VI WordCount.txt
Write a Word 12
HELLO World
HELLO BABY
HELLO A
HELLO B
HELLO C
HELLO D
WC WordCount.txt
Get:
6 (line) 12 (number of words) 55 (character) WordCount.txt
Upload to Server
Hadoop fs-put wordCount.txt Hdfs://northbigpenguin:9000/wordcount
To see if the upload was successful:
Hadoop Fs-ls Hdfs://northbigpenguin:9000/

Under Directory/mapreduce
Perform:
Hadoop jar Hadoop-mapreduce-examples-2.2.0.jar WordCount hdfs://northbigpenguin:9000/wordcount hdfs:// Northbigpenguin:9000/cout
Hadoop jar (command) jar Package command (count of words) input position output position

Map-reduce Framework
Map input Records=6
Map Output records=12
Map Output bytes=103
Map output materialized bytes=73
Input Split bytes=102
Combine input records=12
Combine Output records=7
Reduce input groups=7
Reduce Shuffle bytes=73
Reduce input records=7
Reduce Output records=7
Spilled records=14
Shuffled Maps =1
Failed shuffles=0
Merged Map Outputs=1
GC time Elapsed (ms) =370
CPU Time Spent (ms) =3150
Physical memory (bytes) snapshot=315904000
Virtual memory (bytes) snapshot=1650171904
Total committed heap usage (bytes) =136122368
Shuffle Errors
Bad_id=0
Connection=0
Io_error=0
Wrong_length=0
Wrong_map=0
Wrong_reduce=0
File Input Format Counters
Bytes read=55
File Output Format Counters
Bytes written=39
Success Sign


8, configure SSH free login
To create a file remotely
SSH 10.129.205.237 Mkdir/root/xiaobaitu

Both machines need to have SSH configured to perform remote connections:

(1) Redhat/centos-based systems:
Download the installation (it will be installed in the root directory when finished: Execute the command ls-a view hidden files. SSH):
# yum Install Openssh-server openssh-clients
After the OpenSSH server installation is complete, a service called sshd should be added to the/ETC/INIT.D directory.
Chkconfig--list sshd

Manually start the SSHD service to facilitate subsequent client connections:
$/etc/init.d/sshd Start
(2) Method two: Install SSH:
Yum Install SSH
To start SSH:
Service sshd Start
Set the boot up run:
Chkconfig sshd on


1. Create:
Mkdir/root/.ssh

2. Generate SSH Free login key
Go to my home directory
CD ~/.ssh
3. Production key
Ssh-keygen-t RSA (four return)
After executing this command, two files Id_rsa (private key), id_rsa.pub (public key) will be generated
Copy the public key to the machine that you want to avoid landing on
CP id_rsa.pub Authorized_keys or Ssh-copy-id Northbigpenguin
Or
Copy the public key to the machine that you want to avoid landing on
Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Attention:
The public key must be called Authorized_keys, the system default

Log in to the machine does not lose the password
SSH Northbigpenguin

Sign Out
Exit

View Public Key
Cat ~/.ssh

Single connection:
Copy your own public key to other machines that you want to log on without a password
Ssh-copy-id 10.129.205.237 (want to password-free machine)

Want to visit others, is to test their own password to others, and vice versa to test someone's password to their
A dialog box pops up
[email protected] ' s password:
Enter the password for each other
will return:
Now try logging to the machine, with "ssh ' 10.129.205.237 '", and check in:
. Ssh/authorized_keys
To make sure we haven ' t added extra keys so you weren ' t expecting.
Indicates success
SSH 10.129.205.237 into someone else's machine login
Exit and change back to your machine.

Two-way Connection:
Have each other's public key
The ssh-keygen-t RSA of the other machine generates the ID_RSA (private key), id_rsa.pub (public key) plus Authorized_keys (the public key of the first machine)
Authorized_keys is a set alias to prevent duplicate overwrite
So just do it now
Ssh-copy-id 10.129.205.250 (IP of another machine)
Then input confirmation and password
Is you sure want to continue connecting (yes/no)? Yes
[email protected] ' s password:
To test the connection:
SSH 10.129.205.250

Hadoop has been built several times, complete summary (tired baby, built more than 10 times)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.