Directory structure
Hadoop cluster (CDH4) practice (0) Preface
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) build
Hadoop cluster (CDH4) Practice (2) Hbase&zookeeper build
Hadoop cluster (CDH4) Practice (3) Hive Build
Hadoop cluster (CHD4) Practice (4) Oozie build
Hadoop cluster (CDH4) practice (0) Preface
During my time as a beginner of Hadoop, I wrote a series of introductory Hadoop articles, the first of which is "Hadoop cluster practice (0) Complete architecture Design"
In my previous series, I also explained some of the introductory concepts of Hadoop, mainly for some of the puzzles I've encountered.
At the same time, in the previous series, I also listed a number of small operation demo to deepen the understanding of the various tools.
So why do you want to write this series of articles again, it seems that the content feels to be repeated.
In fact, it is mainly due to the following reasons:
1. The previous article is based on the Ubuntu 10.10 system, also applies to the new version of Ubuntu, but the use of CentOS as a production environment more;
And since some of Ubuntu's changes are not in line with the open source community, there is a tendency to sing down Ubuntu.
2. With the specification and rapid development of extension libraries such as Epel, CentOS has a rich software library of the same size as Ubuntu, and it is also very convenient to install and deploy software through Yum.
3. The previous article is based on CDH3, and the current development of Hadoop, CDH4 has become the mainstream, with CDH3 not have some features, I think the most useful features are the following:
A) Namenode ha, unlike secondary namenode, CDH4 provides a way of HA to ensure dual-node namenode;
b Tasktracker provides a fault-tolerant mechanism to ensure that the parallel computing process does not result in the failure of the whole parallel computation because of a node error;
Therefore, based on the above reasons, this paper is based on the CDH4 environment of CentOS 6.4 x86_64 system.
However, the Namenode HA and Tasktracker fault-tolerant tests have not yet been completed, and the relevant content is not yet visible.
At the same time, this paper uses a yarn approach, but the same MRV1 computing framework as CDH3, in order to ensure that the code developed before the company's online environment can run correctly.
(1) Hadoop (HDFS) build
Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
hadoop-secondarynamenode:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
Hadoop-node-1:172.17.20.231 memory 10G
-Datanode,tasktracker
Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All commands that go directly to $ and do not follow the host name are required to be executed on all servers, unless there is a separate//beginning or caption in the title.
1. Choose the best installation package
For a more convenient and standardized deployment of the Hadoop cluster, we used the Cloudera integration package.
Because Cloudera has done a lot of optimization on Hadoop-related systems, many bugs have been avoided due to different versions of the system.
This is also recommended by many senior Hadoop administrators.
https://ccp.cloudera.com/display/DOC/Documentation/
2. Installing the Java Environment
Because the entire Hadoop project is primarily done through Java development, the support of the JVM is required.
Login www.oracle.com (you need to create an id), download a 64-bit JDK from the following address, such as jdk-7u45-linux-x64.rpm
Http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
The code is as follows |
Copy Code |
$ sudo rpm-ivh jdk-7u45-linux-x64.rpm $ sudo vim/etc/profile.d/java.sh Export java_home=/usr/java/jdk1.7.0_45 Export Jre_home= $JAVA _home/jre Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH Export path= $JAVA _home/bin: $JRE _home/bin: $PATH $ sudo chmod +x/etc/profile.d/java.sh $ source/etc/profile
|
3. Configure the Hadoop installation source
The code is as follows |
Copy Code |
$ sudo rpm--import Http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera $ cd/etc/yum.repos.d/ $ sudo wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo |
4. Installation of Hadoop-related kits, select MRV1 Framework Support
code is as follows |
copy code |
$ sudo yum Install Hadoop-hdfs-namenode//only on Hadoop-master install $ sudo yum install Hadoop-hdfs-secondarynamenode// Install $ sudo yum install Hadoop-0.20-mapreduce-jobtracker//Only on Hadoop-secondary on hadoop-secondary $ sudo yum Install Hadoop-hdfs-datanode//only on Hadoop-node install $ sudo yum install Hadoop-0.20-mapreduce-tasktracker// Install $ sudo yum install hadoop-client 5 only on Hadoop-node. Create a Hadoop profile |
The code is as follows |
Copy Code |
$ sudo cp-r/etc/hadoop/conf.dist/etc/hadoop/conf.my_cluster |
6. Activate the new configuration file
The code is as follows |
Copy Code |
$ sudo alternatives--verbose--install/etc/hadoop/conf hadoop-conf/etc/hadoop/conf.my_cluster 50 $ sudo alternatives--set hadoop-conf/etc/hadoop/conf.my_cluster $ cd/etc/hadoop/conf |
7. Add hosts record and modify the corresponding host name
The code is as follows |
Copy Code |
$ sudo vim/etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 :: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.17.20.230 Hadoop-master 172.17.20.234 hadoop-secondary 172.17.20.231 hadoop-node-1 172.17.20.232 hadoop-node-2 172.17.20.233 hadoop-node-3 |
8. Install Lzo Support
The code is as follows |
Copy Code |
$ cd/etc/yum.repos.d $ sudo wget http://archive.cloudera.com/gplextras/redhat/6/x86_64/gplextras/cloudera-gplextras4.repo $ sudo yum install Hadoop-lzo-cdh4 |
9. Configure the files under hadoop/conf
The code is as follows |
Copy Code |
$ sudo vim/etc/hadoop/conf/masters Hadoop-master $ sudo vim/etc/hadoop/conf/slaves
Hadoop-node-1 Hadoop-node-2 Hadoop-node-3
|
10. Create a HDFs directory for Hadoop
The code is as follows |
Copy Code |
$ sudo mkdir-p/data/{1,2,3,4}/mapred/local $ sudo chown-r mapred:hadoop/data/{1,2,3,4}/mapred/local $ sudo chown-r hdfs:hdfs/data/1/dfs/nn/nfsmount/dfs/nn/data/{1,2,3,4}/dfs/dn $ sudo chmod 700/data/1/dfs/nn/nfsmount/dfs/nn/data/{1,2,3,4}/dfs/dn |
11. Configure Core-site.xml
The code is as follows |
Copy Code |
$ sudo vim/etc/hadoop/conf/core-site.xml <?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-master/</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/1/dfs/nn,/nfsmount/dfs/nn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value> </property> <property> <name>fs.checkpoint.period</name> <value>300</value> </property> <property> <name>fs.checkpoint.dir</name> <value>${hadoop.tmp.dir}/dfs/namesecondary</value> </property> </configuration> |
12. Configure Hdfs-site.xml
The code is as follows |
Copy Code |
$ sudo vim/etc/hadoop/conf/hdfs-site.xml <?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <configuration> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoop-master:50070</value> <description> The address and the base port on which the DFS Namenode Web UI would listen. </description> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop-secondary:50090</value> </property> </configuration> |
13. Configure Mapred-site.xml
code is as follows |
copy code |
$ sudo vim/ Etc/hadoop/conf/mapred-site.xml <?xml version= "1.0" <?xml-stylesheet type= "text/ XSL "href=" configuration.xsl <configuration> <property> <name> Mapred.job.tracker</name> <value>hadoop-secondary:8021</value> </property> <property> <name>mapred.local.dir</name> <value>/data/1/mapred/ Local,/data/2/mapred/local,/data/3/mapred/local</value> </property> </configuration> |
14. Format HDFs Distributed File system
The code is as follows |
Copy Code |
$ sudo-u HDFs Hadoop namenode-format//Run only once on Hadoop-master |
15. Start the Hadoop process
Start the Namenode on the Hadoop-master
The code is as follows |
Copy Code |
$ sudo/etc/init.d//etc/init.d/hadoop-hdfs-namenode Start |
Start the Secondarynamenode,jobtracker on the Hadoop-secondary
The code is as follows |
Copy Code |
$ sudo/etc/init.d/hadoop-hdfs-secondarynamenode Start $ sudo/etc/init.d/hadoop-0.20-mapreduce-jobtracker Start |
Start the Datanode,tasktracker on the Hadoop-node
The code is as follows |
Copy Code |
$ sudo/etc/init.d/hadoop-hdfs-datanode Start $ sudo/etc/init.d/hadoop-0.20-mapreduce-tasktracker Start |
16. Create Mapred.system.dir and/tmp HDFs directory
The following HDFS operations should only be performed once on any single host
The code is as follows |
Copy Code |
$ sudo-u HDFs Hadoop fs-mkdir/tmp $ sudo-u HDFs Hadoop fs-chmod-r 1777/tmp $ sudo-u HDFs Hadoop fs-mkdir-p/var/lib/hadoop-hdfs/cache/mapred/mapred/staging $ sudo-u HDFs Hadoop fs-chmod 1777/var/lib/hadoop-hdfs/cache/mapred/mapred/staging $ sudo-u HDFs Hadoop fs-chown-r mapred/var/lib/hadoop-hdfs/cache/mapred $ sudo-u HDFs Hadoop fs-ls-r/ $ sudo-u HDFs Hadoop Fs-mkdir/tmp/mapred/system $ sudo-u HDFs Hadoop fs-chown Mapred:hadoop/tmp/mapred/system |
17. View the status of the entire cluster
Viewing through a Web page: http://hadoop-master:50070
18. At this point, the construction of Hadoop (HDFS) has been completed.
19. We can then start the following process:
(2) Hbase&zookeeper build
Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master
hadoop-secondarynamenode:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
Hadoop-node-1:172.17.20.231 ram 10G sudo yum install Hbase-regionserver
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services
This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All commands that go directly to $ and do not follow the host name are required to be executed on all servers, unless there is a separate//beginning or caption in the title.
1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (1) Hadoop (HDFS) build
Configuring NTP clock synchronization
The code is as follows |
Copy Code |
$ sudo yum install NTP $ sudo/etc/init.d/ntpd Start |
Configuring Ulimit and Nproc Parameters
The code is as follows |
Copy Code |
$ sudo vim/etc/security/limits.conf Hdfs-nofile 32768 Hbase-nofile 32768
|
Exit and log back on SSH to make settings take effect
2. Install the Hbase-master on the Hadoop-secondary
The code is as follows |
Copy Code |
$ sudo yum install Hbase-master $ sudo yum install Hbase-rest $ sudo yum install Hbase-thrift |
3. Install the Hbase-regionserver on the Hadoop-node
The code is as follows |
Copy Code |
$ sudo yum install Hbase-regionserver |
4. Create a hbase directory in HDFs
The following HDFS operations should only be performed once on any single host
The code is as follows |
Copy Code |
$ sudo-u HDFs Hadoop fs-mkdir/hbase $ sudo-u HDFs Hadoop fs-chown hbase/hbase |
5. Configure Hbase-site.xml
The code is as follows |
Copy Code |
$ sudo vim/etc/ Hbase/conf/hbase-site.xml $ cat/etc/hbase/conf/hbase-site.xml <?xml version= "1.0" <?xml-stylesheet type= "text/xsl" href= "configuration.xsl" <configuration> <property <name>hbase.rest.port</name> <VALUE>60050</VALUE> </ Property> <property> <name>hbase.cluster.distributed</name> < Value>true</value> </property> <property> <name>hbase.rootdir</ Name> <value>hdfs://hadoop-master:8020/hbase</value> </property> < Property> <name>hbase.zookeeper.quorum</name> <value>hadoop-node-1, Hadoop-node-2,hadoop-node-3</value> </property> </configuration> |
6. Configure Regionservers
The code is as follows |
Copy Code |
$ sudo vim/etc/hbase/conf/regionservers Hadoop-node-1 Hadoop-node-2 Hadoop-node-3
|
7. Install Zookeeper
The code is as follows |
Copy Code |
$ sudo yum install zookeeper $ sudo vim/etc/zookeeper/conf/zoo.cfg $ cat/etc/zookeeper/conf/zoo.cfg ticktime=2000 initlimit=10 Synclimit=5 Datadir=/var/lib/zookeeper clientport=2181 Maxclientcnxns=0 server.1=hadoop-node-1:2888:3888 server.2=hadoop-node-2:2888:3888 server.3=hadoop-node-3:2888:3888
|
8. Install Zookeeper-server on Hadoop-node and create myID files
The code is as follows |
Copy Code |
$ sudo yum install Zookeeper-server $ sudo touch/var/lib/zookeeper/myid $ sudo chown-r zookeeper:zookeeper/var/lib/zookeeper $ echo 1 >/var/lib/zookeeper/myid//Only executes on hadoop-node-1 $ echo 2 >/var/lib/zookeeper/myid//Only executes on hadoop-node-2 $ echo 3 >/var/lib/zookeeper/myid//Only executes on hadoop-node-3 |
9. Start HBase and Zookeeper service
Only on the Hadoop-master.
The code is as follows |
Copy Code |
$ sudo/etc/init.d/hbase-master Start $ sudo/etc/init.d/hbase-thrift Start $ sudo/etc/init.d/hbase-rest Start |
Only on the Hadoop-node.
The code is as follows |
Copy Code |
$ sudo/etc/init.d/hbase-regionserver Start |
10. View the status of the service
View http://hadoop-master:60010 from a Web page
11. So far, the construction of Hbase&zookeeper has been completed.
12. We can then start the following process:
(3) Hive build
Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master
hadoop-secondary:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
-Hive-server,hive-metastore
Hadoop-node-1:172.17.20.231 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services
Management Services for Hive-server-hive
Hive-metastore-hive, used for type checking and parsing of meta data
This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All of the following actions need to be performed on the Hive host, that is, hadoop-secondary.
1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (2) Hbase&zookeeper build
2. Install Hive
The code is as follows |
Copy Code |
$ sudo yum install hive Hive-metastore hive-server $ sudo yum install hive-jdbc hive-base |
3. Install MySQL JDBC Connector
The code is as follows |
Copy Code |
$ sudo yum install Mysql-connector-java $ sudo ln-s/usr/share/java/mysql-connector-java.jar/usr/lib/hive/lib/mysql-connector-java.jar |
4. Install MySQL
code is as follows |
copy code |
$ sudo yum Install Mysql-server $ sudo/etc/init.d/mysqld start $ sudo/usr/bin/mysql_secure_installation [...] Enter current password for root (enter to none): OK, successfully used password, moving on ... [...] Set root Password? [y/n] Y New Password:hiveserver Re-enter new Password:hiverserver Remove anonymous users? [y/n] Y [...] Disallow root login remotely? [y/n] N [...] Remove test database and access to it [y/n] Y [...] Reload privilege tables now? [y/n] Y All done! |
5. Create a database and authorize
The code is as follows |
Copy Code |
$ mysql-u root- Phiveserver mysql> CREATE DATABASE metastore; mysql> use Metastore; mysql> source/usr /lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql; Mysql> CREATE USER ' hive ' @ '% ' identified by ' hiveserver '; mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ '% '; mysql> REVOKE alter,create on metastore.* from ' hive ' @ '% '; Mysql> CREATE USER ' hive ' @ ' localhost ' identified by ' hiveserver '; mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ ' localhost '; mysql> REVOKE alter,create on metastore.* from ' hive ' @ ' localhost '; Mysql> CREATE USER ' hive ' @ ' 127.0.0.1 ' identified by ' hiveserver '; mysql> GRANT select,insert,update,delete on metastore.* to ' hive ' @ ' 127.0.0.1 '; mysql> REVOKE alter,create on metastore.* from ' hive ' @ ' 127.0.0.1 '; |
6. Configure Hive-site.xml
The code is as follows |
Copy Code |
$ sudo vim/etc/hive/conf/hive-site.xml <?xml version= "1.0"?> <?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?> <configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop-secondary/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hiveserver</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop-secondary:9083</value>
<description>ip address (or fully-qualified domain name) and port of the Metastore host</description>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:////usr/lib/hive/lib/hbase.jar,file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/ Lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop-node-1,hadoop-node-2,hadoop-node-3</value>
</property>
</configuration> |
7. Start Hive
The code is as follows |
Copy Code |
$/etc/init.d/hive-metastore Start $/etc/init.d/hive-server Start |
8. Create the HDFs directory required for Hive
The code is as follows |
Copy Code |
$ sudo-u HDFs Hadoop fs-mkdir/user/hive $ sudo-u HDFs Hadoop fs-chown hive/user/hive $ sudo-u HDFs Hadoop fs-mkdir/user/hive $ sudo-u hdfs Hadoop fs-ls-r/user $ sudo-u HDFs Hadoop fs-chown hive/user/hive $ sudo-u HDFs Hadoop fs-chmod-r 777/tmp/hadoop-mapred $ sudo-u HDFs Hadoop fs-chmod 777/tmp/hive-hive $ sudo chown-r hive:hive/var/lib/hive/.hivehistory |
9. So far, the construction of hive has been completed.
10. We can then start the following process:
(4) Oozie build
Environmental preparedness
Os:centos 6.4 x86_64
Servers:
hadoop-master:172.17.20.230 Memory 10G
-Namenode
-Hbase-master
hadoop-secondary:172.17.20.234 Memory 10G
-Secondarybackupnamenode,jobtracker
-Hive-server,hive-metastore
-Oozie
Hadoop-node-1:172.17.20.231 ram 10G sudo yum install Hbase-regionserver
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-2:172.17.20.232 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Hadoop-node-3:172.17.20.233 memory 10G
-Datanode,tasktracker
-Hbase-regionserver,zookeeper-server
Make some simple introductions to the above roles:
Namenode-The entire HDFs namespace management Service
Secondarynamenode-a redundant service that can be viewed as Namenode
Jobtracker-Job Management services for parallel computing
Node Services for Datanode-hdfs
Tasktracker-Job execution services for parallel computing
Management Services for Hbase-master-hbase
Hbase-regionserver-Provide services for client-side inserts, deletes, query data, etc.
Zookeeper-server-zookeeper collaboration and Configuration Management Services
Management Services for Hive-server-hive
Hive-metastore-hive, used for type checking and parsing of meta data
Oozie-oozie is a Java Web application for workflow definition and management
This article defines the specification to avoid confusion in the understanding of configuring multiple servers:
All of the following actions need to be performed on the Oozie host, that is, hadoop-secondary.
1. Pre-Installation Preparation
Hadoop cluster (CDH4) Practice (3) Hive Build
2. Install Oozie
The code is as follows |
Copy Code |
$ sudo yum install Oozie oozie-client |
3. Create Oozie Database
The code is as follows |
Copy Code |
$ mysql-uroot-phiveserver mysql> CREATE DATABASE Oozie; Mysql> grant all privileges in oozie.* to ' oozie ' @ ' localhost ' identified by ' Oozie '; Mysql> grant all privileges in oozie.* to ' oozie ' @ '% ' identified by ' Oozie '; Mysql> exit;
|
4. Configure Oozie-site.xml
The code is as follows |
Copy Code |
$ sudo vim/etc/oozie/conf/oozie-site.xml
<?xml version= "1.0"?>
<configuration>
<property>
<name>oozie.service.ActionService.executor.ext.classes</name>
<value>
Org.apache.oozie.action.email.EmailActionExecutor,
Org.apache.oozie.action.hadoop.HiveActionExecutor,
Org.apache.oozie.action.hadoop.ShellActionExecutor,
Org.apache.oozie.action.hadoop.SqoopActionExecutor,
Org.apache.oozie.action.hadoop.DistcpActionExecutor
</value>
</property>
<property>
<name>oozie.service.SchemaService.wf.ext.schemas</name>
<value>shell-action-0.1.xsd,shell-action-0.2.xsd,email-action-0.1.xsd,hive-action-0.2.xsd, Hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,sqoop-action-0.2.xsd,sqoop-action-0.3.xsd, Ssh-action-0.1.xsd,ssh-action-0.2.xsd,distcp-action-0.1.xsd</value>
</property>
<property>
<name>oozie.system.id</name>
<value>oozie-${user.name}</value>
</property>
<property>
<name>oozie.systemmode</name>
<value>NORMAL</value>
</property>
<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>false</value>
</property>
<property>
<name>oozie.service.PurgeService.older.than</name>
<value>30</value>
</property>
<property>
<name>oozie.service.PurgeService.purge.interval</name>
<value>3600</value>
</property>
<property>
<name>oozie.service.CallableQueueService.queue.size</name>
<value>10000</value>
</property>
<property>
<name>oozie.service.CallableQueueService.threads</name>
<value>10</value>
</property>
<property>
<name>oozie.service.CallableQueueService.callable.concurrency</name>
<value>3</value>
</property>
<property>
<name>oozie.service.coord.normal.default.timeout
</name>
<value>120</value>
</property> <property> <name>oozie.db.schema.name</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.create.db.schema</name> <value>true</value> </property> <property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://localhost:3306/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie</value>
</property> <property> <name>oozie.service.JPAService.pool.max.active.conn</name> <value>10</value> </property> <property>
<name>oozie.service.HadoopAccessorService.kerberos.enabled</name>
<value>false</value>
</property>
<property>
<name>local.realm</name>
<value>LOCALHOST</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.keytab.file</name>
<value>${user.home}/oozie.keytab</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.kerberos.principal</name>
<value>${user.name}/localhost@${local.realm}</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
<value> </value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.nameNode.whitelist</name>
<value> </value>
</property> <property> <name>oozie.service.HadoopAccessorService.hadoop.configurations</name> <value>*=/etc/hadoop/conf</value> </property> <property> <name>oozie.service.WorkflowAppService.system.libpath</name> <value>/user/${user.name}/share/lib</value> </property> <property> <name>use.system.libpath.for.mapreduce.and.pig.jobs</name> <value>false</value> </property> <property> <name> Oozie.authentication.type</name> <value>simple</ Value> </property> <property> <name>oozie.authentication.token.validity</name> <value>36000</value> </property> <property> <name> Oozie.authentication.signature.secret</name> <value> Oozie</value> </property> <property> <name>oozie.authentication.cookie.domain</name> <value></value> </property> <property> <name>oozie.authentication.simple.anonymous.allowed</name> <value>true</value> </property> <property> <name>oozie.authentication.kerberos.principal</name> <value>HTTP/localhost@${local.realm}</value> </property> <property> <name>oozie.authentication.kerberos.keytab</name> <value>${oozie.service.HadoopAccessorService.keytab.file}</value> </property> <property> <name>oozie.authentication.kerberos.name.rules</name> <value>DEFAULT</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.oozie.groups</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value> </property> <property> <name>oozie.action.mapreduce.uber.jar.enable</name> <value>true</value> </property> <property> <name>oozie.service.HadoopAccessorService.supported.filesystems</name> <value>hdfs,viewfs</value> </property> </configuration> |
5. Configure Oozie Web Console
The code is as follows |
Copy Code |
$ cd/tmp/ $ wget Http://archive.cloudera.com/gplextras/misc/ext-2.2.zip $ cd/var/lib/oozie/ $ sudo unzip/tmp/ext-2.2.zip $ CD EXT-2.2/ $ sudo-u HDFs Hadoop Fs-mkdir/user/oozie $ sudo-u HDFs Hadoop fs-chown Oozie:oozie/user/oozie |
6. Configure Oozie Sharelib
code is as follows |
copy code |
$ mkdir/tmp/ooziesharelib $ cd/tmp/ooziesharelib $ tar xzf/usr/lib/oozie/ Oozie-sharelib.tar.gz $ sudo-u oozie Hadoop fs-put share/user/oozie/share $ sudo-u oozie Hadoop fs-ls/user/o Ozie/share $ sudo-u oozie hadoop fs-ls/user/oozie/share/lib $ sudo-u oozie Hadoop FS-PUT/USR/LIB/HIVE/LIB/HB ase.jar/user/oozie/share/lib/hive/ $ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/zookeeper.jar/user/oozie/share /lib/hive/ $ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar/user/oozie/ share/lib/hive/ $ sudo-u oozie Hadoop fs-put/usr/lib/hive/lib/guava-11.0.2.jar/user/oozie/share/lib/hive/ | /tr>
7. Start Oozie
The code is as follows |
Copy Code |
$ sudo service Oozie start |
8. Visit the Oozie Web Console
The code is as follows |
Copy Code |
Http://hadoop-secondary:11000/oozie |
9. So far, the Oozie has been completed.