Hadoop cluster (CHD4) practice (Hadoop/hbase&zookeeper/hive/oozie)

Source: Internet
Author: User
Tags anonymous chmod introductions xsl zookeeper java web sqoop hadoop fs

Directory structure

Hadoop cluster (CDH4) practice (0) Preface
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) build
Hadoop cluster (CDH4) Practice (2) Hbase&zookeeper build
Hadoop cluster (CDH4) Practice (3) Hive Build
Hadoop cluster (CHD4) Practice (4) Oozie build

Hadoop cluster (CDH4) practice (0) Preface

During my time as a beginner of Hadoop, I wrote a series of introductory Hadoop articles, the first of which is "Hadoop cluster practice (0) Complete architecture Design"

In my previous series, I also explained some of the introductory concepts of Hadoop, mainly for some of the puzzles I've encountered.
At the same time, in the previous series, I also listed a number of small operation demo to deepen the understanding of the various tools.

So why do you want to write this series of articles again, it seems that the content feels to be repeated.
In fact, it is mainly due to the following reasons:
1. The previous article is based on the Ubuntu 10.10 system, also applies to the new version of Ubuntu, but the use of CentOS as a production environment more;

And since some of Ubuntu's changes are not in line with the open source community, there is a tendency to sing down Ubuntu.
2. With the specification and rapid development of extension libraries such as Epel, CentOS has a rich software library of the same size as Ubuntu, and it is also very convenient to install and deploy software through Yum.
3. The previous article is based on CDH3, and the current development of Hadoop, CDH4 has become the mainstream, with CDH3 not have some features, I think the most useful features are the following:
A) Namenode ha, unlike secondary namenode, CDH4 provides a way of HA to ensure dual-node namenode;
b Tasktracker provides a fault-tolerant mechanism to ensure that the parallel computing process does not result in the failure of the whole parallel computation because of a node error;

Therefore, based on the above reasons, this paper is based on the CDH4 environment of CentOS 6.4 x86_64 system.
However, the Namenode HA and Tasktracker fault-tolerant tests have not yet been completed, and the relevant content is not yet visible.
At the same time, this paper uses a yarn approach, but the same MRV1 computing framework as CDH3, in order to ensure that the code developed before the company's online environment can run correctly.

(1) Hadoop (HDFS) build


Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode

hadoop-secondarynamenode:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker

Hadoop-node-1:172.17.20.231 memory 10G
-Datanode,tasktracker

Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker

Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker

Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing

This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All commands that go directly to $ and do not follow the host name are required to be executed on all servers, unless there is a separate//beginning or caption in the title.

1. Choose the best installation package
For a more convenient and standardized deployment of the Hadoop cluster, we used the Cloudera integration package.
Because Cloudera has done a lot of optimization on Hadoop-related systems, many bugs have been avoided due to different versions of the system.
This is also recommended by many senior Hadoop administrators.
https://ccp.cloudera.com/display/DOC/Documentation/

2. Installing the Java Environment
Because the entire Hadoop project is primarily done through Java development, the support of the JVM is required.
Login www.oracle.com (you need to create an id), download a 64-bit JDK from the following address, such as jdk-7u45-linux-x64.rpm

Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

The code is as follows Copy Code

$ sudo rpm-ivh jdk-7u45-linux-x64.rpm
$ sudo vim/etc/profile.d/java.sh


Export java_home=/usr/java/jdk1.7.0_45
Export Jre_home= $JAVA _home/jre
Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH
Export path= $JAVA _home/bin: $JRE _home/bin: $PATH
$ sudo chmod +x/etc/profile.d/java.sh
$ source/etc/profile

3. Configure the Hadoop installation source

The code is as follows Copy Code
$ sudo rpm--import Http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
$ cd/etc/yum.repos.d/
$ sudo wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo

4. Installation of Hadoop-related kits, select MRV1 Framework Support

  code is as follows copy code

$ sudo yum Install Hadoop-hdfs-namenode//only on Hadoop-master install

$ sudo yum install Hadoop-hdfs-secondarynamenode// Install
$ sudo yum install Hadoop-0.20-mapreduce-jobtracker//Only on Hadoop-secondary on hadoop-secondary

$ sudo yum Install Hadoop-hdfs-datanode//only on Hadoop-node install
$ sudo yum install Hadoop-0.20-mapreduce-tasktracker// Install

$ sudo yum install hadoop-client

5 only on Hadoop-node. Create a Hadoop profile

The code is as follows Copy Code
$ sudo cp-r/etc/hadoop/conf.dist/etc/hadoop/conf.my_cluster

6. Activate the new configuration file

The code is as follows Copy Code
$ sudo alternatives--verbose--install/etc/hadoop/conf hadoop-conf/etc/hadoop/conf.my_cluster 50
$ sudo alternatives--set hadoop-conf/etc/hadoop/conf.my_cluster
$ cd/etc/hadoop/conf

7. Add hosts record and modify the corresponding host name

The code is as follows Copy Code

$ sudo vim/etc/hosts


127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6

172.17.20.230 Hadoop-master
172.17.20.234 hadoop-secondary
172.17.20.231 hadoop-node-1
172.17.20.232 hadoop-node-2
172.17.20.233 hadoop-node-3

8. Install Lzo Support

The code is as follows Copy Code
$ cd/etc/yum.repos.d
$ sudo wget http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/cloudera-gplextras4.repo
$ sudo yum install Hadoop-lzo-cdh4

9. Configure the files under hadoop/conf

The code is as follows Copy Code

$ sudo vim/etc/hadoop/conf/masters


Hadoop-master
$ sudo vim/etc/hadoop/conf/slaves


Hadoop-node-1
Hadoop-node-2
Hadoop-node-3

10. Create a HDFs directory for Hadoop

The code is as follows Copy Code

$ sudo mkdir-p/data/{1,2,3,4}/mapred/local
$ sudo chown-r mapred:hadoop/data/{1,2,3,4}/mapred/local

$ sudo chown-r hdfs:hdfs/data/1/dfs/nn/nfsmount/dfs/nn/data/{1,2,3,4}/dfs/dn
$ sudo chmod 700/data/1/dfs/nn/nfsmount/dfs/nn/data/{1,2,3,4}/dfs/dn

11. Configure Core-site.xml

The code is as follows Copy Code

$ sudo vim/etc/hadoop/conf/core-site.xml


<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master/</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/1/dfs/nn,/nfsmount/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
</property>

<property>
<name>fs.checkpoint.period</name>
<value>300</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>${hadoop.tmp.dir}/dfs/namesecondary</value>
</property>
</configuration>

12. Configure Hdfs-site.xml

The code is as follows Copy Code

$ sudo vim/etc/hadoop/conf/hdfs-site.xml


<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-master:50070</value>
<description>
The address and the base port on which the DFS Namenode Web UI would listen.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-secondary:50090</value>
</property>
</configuration>

13. Configure Mapred-site.xml

  code is as follows copy code

$ sudo vim/ Etc/hadoop/conf/mapred-site.xml

 
<?xml version= "1.0"
<?xml-stylesheet type= "text/ XSL "href=" configuration.xsl

<configuration>
<property>
 <name> Mapred.job.tracker</name>
 <value>hadoop-secondary:8021</value>
</property>
<property>
 <name>mapred.local.dir</name>
 <value>/data/1/mapred/ Local,/data/2/mapred/local,/data/3/mapred/local</value>
</property>
</configuration>

14. Format HDFs Distributed File system

The code is as follows Copy Code
$ sudo-u HDFs Hadoop namenode-format//Run only once on Hadoop-master

15. Start the Hadoop process
Start the Namenode on the Hadoop-master

The code is as follows Copy Code
$ sudo/etc/init.d//etc/init.d/hadoop-hdfs-namenode Start

Start the Secondarynamenode,jobtracker on the Hadoop-secondary

The code is as follows Copy Code
$ sudo/etc/init.d/hadoop-hdfs-secondarynamenode Start
$ sudo/etc/init.d/hadoop-0.20-mapreduce-jobtracker Start

Start the Datanode,tasktracker on the Hadoop-node

The code is as follows Copy Code
$ sudo/etc/init.d/hadoop-hdfs-datanode Start
$ sudo/etc/init.d/hadoop-0.20-mapreduce-tasktracker Start

16. Create Mapred.system.dir and/tmp HDFs directory
The following HDFS operations should only be performed once on any single host

The code is as follows Copy Code
$ sudo-u HDFs Hadoop fs-mkdir/tmp
$ sudo-u HDFs Hadoop fs-chmod-r 1777/tmp
$ sudo-u HDFs Hadoop fs-mkdir-p/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo-u HDFs Hadoop fs-chmod 1777/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
$ sudo-u HDFs Hadoop fs-chown-r mapred/var/lib/hadoop-hdfs/cache/mapred
$ sudo-u HDFs Hadoop fs-ls-r/
$ sudo-u HDFs Hadoop Fs-mkdir/tmp/mapred/system
$ sudo-u HDFs Hadoop fs-chown Mapred:hadoop/tmp/mapred/system

17. View the status of the entire cluster
Viewing through a Web page: http://hadoop-master:50070

18. At this point, the construction of Hadoop (HDFS) has been completed.

19. We can then start the following process:


(2) Hbase&zookeeper build

Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master

hadoop-secondarynamenode:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker

Hadoop-node-1:172.17.20.231 ram 10G sudo yum install Hbase-regionserver
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services

This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All commands that go directly to $ and do not follow the host name are required to be executed on all servers, unless there is a separate//beginning or caption in the title.

1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) build

Configuring NTP clock synchronization

The code is as follows Copy Code
$ sudo yum install NTP
$ sudo/etc/init.d/ntpd Start

Configuring Ulimit and Nproc Parameters

The code is as follows Copy Code

$ sudo vim/etc/security/limits.conf


Hdfs-nofile 32768
Hbase-nofile 32768

Exit and log back on SSH to make settings take effect

2. Install the Hbase-master on the Hadoop-secondary

The code is as follows Copy Code
$ sudo yum install Hbase-master
$ sudo yum install Hbase-rest
$ sudo yum install Hbase-thrift

3. Install the Hbase-regionserver on the Hadoop-node

The code is as follows Copy Code
$ sudo yum install Hbase-regionserver

4. Create a hbase directory in HDFs
The following HDFS operations should only be performed once on any single host

The code is as follows Copy Code
$ sudo-u HDFs Hadoop fs-mkdir/hbase
$ sudo-u HDFs Hadoop fs-chown hbase/hbase

5. Configure Hbase-site.xml

The code is as follows Copy Code

$ sudo vim/etc/ Hbase/conf/hbase-site.xml
$ cat/etc/hbase/conf/hbase-site.xml

 
<?xml version= "1.0"
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"

<configuration>
<property
  <name>hbase.rest.port</name>
  <VALUE>60050</VALUE>
</ Property>
<property>
  <name>hbase.cluster.distributed</name>
  < Value>true</value>
</property>
<property>
  <name>hbase.rootdir</ Name>
  <value>hdfs://hadoop-master:8020/hbase</value>
</property>
< Property>
  <name>hbase.zookeeper.quorum</name>
  <value>hadoop-node-1, Hadoop-node-2,hadoop-node-3</value>
</property>
</configuration>

6. Configure Regionservers

The code is as follows Copy Code

$ sudo vim/etc/hbase/conf/regionservers


Hadoop-node-1
Hadoop-node-2
Hadoop-node-3

7. Install Zookeeper

The code is as follows Copy Code

$ sudo yum install zookeeper
$ sudo vim/etc/zookeeper/conf/zoo.cfg
$ cat/etc/zookeeper/conf/zoo.cfg


ticktime=2000
initlimit=10
Synclimit=5
Datadir=/var/lib/zookeeper
clientport=2181
Maxclientcnxns=0
server.1=hadoop-node-1:2888:3888
server.2=hadoop-node-2:2888:3888
server.3=hadoop-node-3:2888:3888

8. Install Zookeeper-server on Hadoop-node and create myID files

The code is as follows Copy Code
$ sudo yum install Zookeeper-server
$ sudo touch/var/lib/zookeeper/myid
$ sudo chown-r zookeeper:zookeeper/var/lib/zookeeper
$ echo 1 >/var/lib/zookeeper/myid//Only executes on hadoop-node-1
$ echo 2 >/var/lib/zookeeper/myid//Only executes on hadoop-node-2
$ echo 3 >/var/lib/zookeeper/myid//Only executes on hadoop-node-3

9. Start HBase and Zookeeper service
Only on the Hadoop-master.

The code is as follows Copy Code
$ sudo/etc/init.d/hbase-master Start
$ sudo/etc/init.d/hbase-thrift Start
$ sudo/etc/init.d/hbase-rest Start

Only on the Hadoop-node.

The code is as follows Copy Code
$ sudo/etc/init.d/hbase-regionserver Start

10. View the status of the service
View http://hadoop-master:60010 from a Web page

11. So far, the construction of Hbase&zookeeper has been completed.

12. We can then start the following process:


(3) Hive build

Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master

hadoop-secondary:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
-Hive-server,hive-metastore

Hadoop-node-1:172.17.20.231 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services
Management Services for Hive-server-hive
Hive-metastore-hive, used for type checking and parsing of meta data

This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All of the following actions need to be performed on the Hive host, that is, hadoop-secondary.

1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (2) Hbase&zookeeper build

2. Install Hive

The code is as follows Copy Code
$ sudo yum install hive Hive-metastore hive-server
$ sudo yum install hive-jdbc hive-base

3. Install MySQL JDBC Connector

The code is as follows Copy Code
$ sudo yum install Mysql-connector-java
$ sudo ln-s/usr/share/java/mysql-connector-java.jar/usr/lib/hive/lib/mysql-connector-java.jar

4. Install MySQL

  code is as follows copy code

$ sudo yum Install Mysql-server
$ sudo/etc/init.d/mysqld start

$ sudo/usr/bin/mysql_secure_installation

 
[...]
Enter current password for root (enter to none):
OK, successfully used password, moving on ...
[...]
Set root Password? [y/n] Y
New Password:hiveserver
Re-enter new Password:hiverserver
Remove anonymous users? [y/n] Y
[...]
Disallow root login remotely? [y/n] N
[...]
Remove test database and access to it [y/n] Y
[...]
Reload privilege tables now? [y/n] Y
All done!

5. Create a database and authorize

The code is as follows Copy Code

$ mysql-u root- Phiveserver

 
mysql> CREATE DATABASE metastore;
mysql> use Metastore;
mysql> source/usr /lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;

Mysql> CREATE USER ' hive ' @ '% ' identified by ' hiveserver ';
mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ '% ';
mysql> REVOKE alter,create on metastore.* from ' hive ' @ '% ';

Mysql> CREATE USER ' hive ' @ ' localhost ' identified by ' hiveserver ';
mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ ' localhost ';
mysql> REVOKE alter,create on metastore.* from ' hive ' @ ' localhost ';

Mysql> CREATE USER ' hive ' @ ' 127.0.0.1 ' identified by ' hiveserver ';
mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ ' 127.0.0.1 ';
mysql> REVOKE alter,create on metastore.* from ' hive ' @ ' 127.0.0.1 ';

6. Configure Hive-site.xml

The code is as follows Copy Code

$ sudo vim/etc/hive/conf/hive-site.xml


<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>

&lt;configuration&gt;


&lt;property&gt;


&lt;name&gt;javax.jdo.option.ConnectionURL&lt;/name&gt;


&lt;value&gt;jdbc:mysql://hadoop-secondary/metastore&lt;/value&gt;


&lt;description&gt;the URL of the MySQL database&lt;/description&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;javax.jdo.option.ConnectionDriverName&lt;/name&gt;


&lt;value&gt;com.mysql.jdbc.Driver&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;javax.jdo.option.ConnectionUserName&lt;/name&gt;


&lt;value&gt;hive&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;javax.jdo.option.ConnectionPassword&lt;/name&gt;


&lt;value&gt;hiveserver&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;datanucleus.autoCreateSchema&lt;/name&gt;


&lt;value&gt;false&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;datanucleus.fixedDatastore&lt;/name&gt;


&lt;value&gt;true&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;datanucleus.autoStartMechanism&lt;/name&gt;


&lt;value&gt;SchemaTable&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;hive.metastore.uris&lt;/name&gt;


&lt;value&gt;thrift://hadoop-secondary:9083&lt;/value&gt;


&lt;description&gt;ip address (or fully-qualified domain name) and port of the Metastore host&lt;/description&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;hive.aux.jars.path&lt;/name&gt;


&lt;value&gt;file:////usr/lib/hive/lib/hbase.jar,file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/ Lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;hbase.zookeeper.quorum&lt;/name&gt;


&lt;value&gt;hadoop-node-1,hadoop-node-2,hadoop-node-3&lt;/value&gt;


&lt;/property&gt;


&lt;/configuration&gt;

7. Start Hive

The code is as follows Copy Code
$/etc/init.d/hive-metastore Start
$/etc/init.d/hive-server Start

8. Create the HDFs directory required for Hive

The code is as follows Copy Code

$ sudo-u HDFs Hadoop fs-mkdir/user/hive
$ sudo-u HDFs Hadoop fs-chown hive/user/hive

$ sudo-u HDFs Hadoop fs-mkdir/user/hive
$ sudo-u hdfs Hadoop fs-ls-r/user
$ sudo-u HDFs Hadoop fs-chown hive/user/hive

$ sudo-u HDFs Hadoop fs-chmod-r 777/tmp/hadoop-mapred
$ sudo-u HDFs Hadoop fs-chmod 777/tmp/hive-hive
$ sudo chown-r hive:hive/var/lib/hive/.hivehistory

9. So far, the construction of hive has been completed.

10. We can then start the following process:


(4) Oozie build


Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master

hadoop-secondary:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
-Hive-server,hive-metastore
-Oozie

Hadoop-node-1:172.17.20.231 ram 10G sudo yum install Hbase-regionserver
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server

Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services
Management Services for Hive-server-hive
Hive-metastore-hive, used for type checking and parsing of meta data
Oozie-oozie is a Java Web application for workflow definition and management

This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All of the following actions need to be performed on the Oozie host, that is, hadoop-secondary.

1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (3) Hive Build

2. Install Oozie

The code is as follows Copy Code
$ sudo yum install Oozie oozie-client

3. Create Oozie Database

The code is as follows Copy Code

$ mysql-uroot-phiveserver


mysql> CREATE DATABASE Oozie;
Mysql> grant all privileges in oozie.* to ' oozie ' @ ' localhost ' identified by ' Oozie ';
Mysql> grant all privileges in oozie.* to ' oozie ' @ '% ' identified by ' Oozie ';
Mysql> exit;

4. Configure Oozie-site.xml

The code is as follows Copy Code

$ sudo vim/etc/oozie/conf/oozie-site.xml




&lt;?xml version= "1.0"?&gt;


&lt;configuration&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.ActionService.executor.ext.classes&lt;/name&gt;


&lt;value&gt;


Org.apache.oozie.action.email.EmailActionExecutor,


Org.apache.oozie.action.hadoop.HiveActionExecutor,


Org.apache.oozie.action.hadoop.ShellActionExecutor,


Org.apache.oozie.action.hadoop.SqoopActionExecutor,


Org.apache.oozie.action.hadoop.DistcpActionExecutor


&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.SchemaService.wf.ext.schemas&lt;/name&gt;


&lt;value&gt;shell-action-0.1.xsd,shell-action-0.2.xsd,email-action-0.1.xsd,hive-action-0.2.xsd, Hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action-0.3.xsd, Ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcp-action-0.1.xsd&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.system.id&lt;/name&gt;


&lt;value&gt;oozie-${user.name}&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.systemmode&lt;/name&gt;


&lt;value&gt;NORMAL&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.AuthorizationService.security.enabled&lt;/name&gt;


&lt;value&gt;false&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.PurgeService.older.than&lt;/name&gt;


&lt;value&gt;30&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.PurgeService.purge.interval&lt;/name&gt;


&lt;value&gt;3600&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.CallableQueueService.queue.size&lt;/name&gt;


&lt;value&gt;10000&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.CallableQueueService.threads&lt;/name&gt;


&lt;value&gt;10&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.CallableQueueService.callable.concurrency&lt;/name&gt;


&lt;value&gt;3&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.coord.normal.default.timeout


&lt;/name&gt;


&lt;value&gt;120&lt;/value&gt;


&lt;/property&gt;

<property>
<name>oozie.db.schema.name</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.create.db.schema</name>
<value>true</value>
</property>

&lt;property&gt;


&lt;name&gt;oozie.service.JPAService.jdbc.driver&lt;/name&gt;


&lt;value&gt;com.mysql.jdbc.Driver&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.JPAService.jdbc.url&lt;/name&gt;


&lt;value&gt;jdbc:mysql://localhost:3306/oozie&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.JPAService.jdbc.username&lt;/name&gt;


&lt;value&gt;oozie&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.JPAService.jdbc.password&lt;/name&gt;


&lt;value&gt;oozie&lt;/value&gt;


&lt;/property&gt;

<property>
<name>oozie.service.JPAService.pool.max.active.conn</name>
<value>10</value>
</property>

&lt;property&gt;


&lt;name&gt;oozie.service.HadoopAccessorService.kerberos.enabled&lt;/name&gt;


&lt;value&gt;false&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;local.realm&lt;/name&gt;


&lt;value&gt;LOCALHOST&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.HadoopAccessorService.keytab.file&lt;/name&gt;


&lt;value&gt;${user.home}/oozie.keytab&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.HadoopAccessorService.kerberos.principal&lt;/name&gt;


&lt;value&gt;${user.name}/localhost@${local.realm}&lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.HadoopAccessorService.jobTracker.whitelist&lt;/name&gt;


&lt;value&gt; &lt;/value&gt;


&lt;/property&gt;


&lt;property&gt;


&lt;name&gt;oozie.service.HadoopAccessorService.nameNode.whitelist&lt;/name&gt;


&lt;value&gt; &lt;/value&gt;


&lt;/property&gt;

<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/etc/hadoop/conf</value>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/${user.name}/share/lib</value>
</property>

<property>
<name>use.system.libpath.for.mapreduce.and.pig.jobs</name>
<value>false</value>
</property>

    <property>
        <name> Oozie.authentication.type</name>
        <value>simple</ Value>
    </property>
    <property>
         <name>oozie.authentication.token.validity</name>
         <value>36000</value>
    </property>
     <property>
        <name> Oozie.authentication.signature.secret</name>
        <value> Oozie</value>
    </property>

<property>
<name>oozie.authentication.cookie.domain</name>
<value></value>
</property>

<property>
<name>oozie.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>

<property>
<name>oozie.authentication.kerberos.principal</name>
<value>HTTP/localhost@${local.realm}</value>
</property>

<property>
<name>oozie.authentication.kerberos.keytab</name>
<value>${oozie.service.HadoopAccessorService.keytab.file}</value>
</property>

<property>
<name>oozie.authentication.kerberos.name.rules</name>
<value>DEFAULT</value>
</property>

<property>
<name>oozie.service.ProxyUserService.proxyuser.oozie.hosts</name>
<value>*</value>
</property>

<property>
<name>oozie.service.ProxyUserService.proxyuser.oozie.groups</name>
<value>*</value>
</property>

<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
<value>*</value>
</property>

<property>
<name>oozie.action.mapreduce.uber.jar.enable</name>
<value>true</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.supported.filesystems</name>
<value>hdfs,viewfs</value>
</property>
</configuration>

5. Configure Oozie Web Console

The code is as follows Copy Code
$ cd/tmp/
$ wget Http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
$ cd/var/lib/oozie/
$ sudo unzip/tmp/ext-2.2.zip
$ CD EXT-2.2/
$ sudo-u HDFs Hadoop Fs-mkdir/user/oozie
$ sudo-u HDFs Hadoop fs-chown Oozie:oozie/user/oozie

6. Configure Oozie Sharelib

/tr>
  code is as follows copy code
$ mkdir/tmp/ooziesharelib
$ cd/tmp/ooziesharelib
$ tar xzf/usr/lib/oozie/ Oozie-sharelib.tar.gz
$ sudo-u oozie Hadoop fs-put share/user/oozie/share
$ sudo-u oozie Hadoop fs-ls/user/o Ozie/share
$ sudo-u oozie hadoop fs-ls/user/oozie/share/lib
$ sudo-u oozie Hadoop FS-PUT/USR/LIB/HIVE/LIB/HB ase.jar/user/oozie/share/lib/hive/
$ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/zookeeper.jar/user/oozie/share /lib/hive/
$ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar/user/oozie/ share/lib/hive/
$ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/guava-11.0.2.jar/user/oozie/share/lib/hive/

7. Start Oozie

The code is as follows Copy Code
$ sudo service Oozie start

8. Visit the Oozie Web Console

The code is as follows Copy Code

Http://hadoop-secondary:11000/oozie

9. So far, the Oozie has been completed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.