Oozie is the open source scheduling tool on the Hadoop platform, which has been used Oozie for nearly a year in the project, and the Oozie installation configuration is quite complex. In order to use it conveniently, a lot of configuration needs to be done. The following is a set of steps for Oozie installation configuration, for the use of Hadoop and Oozie children's shoes for reference, but also easy to see their own.
1 Decompression installation Package
Tar-xzf oozie-3.3.2-distro.tar.gz
2 Modifying addtowar.sh scripts
Because the version of Hadoop supported by Oozie declaration does not include 1.0.4, it can actually work with Hadoop-1.0.4. So you need to modify the addtowar.sh script under $oozie_home/bin to add at the end of the function Gethadoopjars ():
elif ["${version}" = "1.0.4"]; Then
#List is separated by ":"
Hadoopjars= "Hadoop-ant-1.0.4.jar:hadoop-client-1.0.4.jar:hadoop-core-1.0.4.jar:hadoop-minicluster-1.0.4.jar: Hadoop-tools-1.0.4.jar:jackson-core-asl-1.8.8.jar:jackson-mapper-asl-1.8.8.jar:log4j-1.2.15.jar: Commons-configuration-1.6.jar "
3 Copy Required Reliance
Because OOZIE needs to deploy all the dependencies into a war package on a built-in tomcat, it is necessary to copy the appropriate jar packs and JS to the $oozie_home/libext directory, including the following dependencies:
4 environment variable Settings
VI ~/.bash_profile
Add to
Export oozie_url=http://localhost:11000/oozie/
Export oozie_home=/home/hadoop/oozie-3.3.2
Path add: $OOZIE _home/bin
5 Proxy settings
If you do not make proxy settings, you will experience a similar error when submitting a task:
Hadoop isn't even to impersonate Hadoop
Hadoop does not allow you to imitate Hadoop, which means that Hadoop does not have the right to submit tasks in lieu of Hadoop.
This problem occurs because Oozie itself does not perform any tasks and does not distribute tasks to tasktracker. The only interaction between the Oozie and the Hadoop cluster is to submit tasks to the Jobtracker and get task execution by callback URLs or polling.
We assume that the Hadoop cluster is installed under the a account, and Oozie is installed under the B account of a node, which belongs to the C user group. The proxy setting represents the following meaning: A account has the right to submit a task in lieu of the C user group in the node.
Add in Core-site.xml
Hadoop.proxyuser.hadoop.hosts
IP
Hadoop.proxyuser.hadoop.groups
Hadoop
In the configuration item, The two Hadoop in Hadoop.proxyuser.hadoop.hosts and Hadoop.proxyuser.hadoop.groups is the account that we mentioned above a,hadoop.proxyuser.hadoop.hosts corresponding value needs to be filled Oozie installation section The ip,hadoop.proxyuser.hadoop.groups of the point corresponds to the value required to complete the user Group C we mentioned above.
As General Hadoop and Oozie are installed in the Hadoop account, and the Hadoop account belongs to the Hadoop user group. So there's this funny configuration that Hadoop submits the task instead of Hadoop.
Note Since this configuration entry is Core-site.xml, you will need to restart the cluster to take effect after modifying the configuration.
6 Time zone settings
The purpose of the time zone setting is to use the East eight area in the configuration.
Add in Oozie-site.xml:
Oozie.processing.timezone
Info
7 Port number Settings
If you need to modify the port number, modify the port number in $oozie_home/conf/oozie-env.sh to prevent the ports from conflicting. The default port number is 11000.
8 Database Settings
Oozie uses RDBMS to store meta information, using Apache's embedded pure Java database Derby by default. There may be problems in use, the recommended use of MySQL, configured as follows:
Oozie.db.schema.name
Oozie
Oozie DataBase Name
Oozie.service.JPAService.create.db.schema
True
Creates Oozie DB.
If set to true, it creates the DB schema if it does not exist. If The DB schema exists is a NOP.
If set to False, it does not create the DB schema. If the DB schema does not exist it fails start up.
Oozie.service.JPAService.jdbc.driver
Com.mysql.jdbc.Driver
JDBC driver class.
Oozi.service.JPAService.jdbc.url
Jdbc:mysql://localhost:3306/oozie?createdatabaseifnotexist=true
The JDBC URL.
Oozie.service.JPAService.jdbc.username
Root
DB user name.
Oozie.service.JPAService.jdbc.password
abc
DB user password.
Important:if password is emtpy leave a 1 space string, the service trims the value,
If empty revisit assumes it is NULL.
9 Running oozie-setup.sh
$OOZIE _home/bin/oozie-setup.sh-hadoop 1.0.4/home/hadoop/hadoop-extjs/home/hadoop/oozie-3.3.2/libext/
10 Creating a Database
$OOZIE _home/bin/ooziedb.sh Create-sqlfile Oozie.sql-run
11 Running oozie-start.sh
$OOZIE _home/bin/oozie-start.sh
12 Running oozie-run.sh
nohup/home/hadoop/oozie-3.3.2/bin/oozie-run.sh >/home/hadoop/oozie-3.3.2/logs/log 2>/home/hadoop/ Oozie-3.3.2/logs/errorlog &
13 Verify that the Oozie is in a normal state
Oozie Admin-oozie Http://IP:11000/oozie-status