Azkaban of the Hadoop workflow engine [go]

Source: Internet
Author: User
Tags time zones local time

Introduced

Azkaban is a mission scheduling system for Twitter, which is much simpler and more intuitive to operate than Oozie, and provides a simple function. Azkaban schedules the execution unit with flow, which is a predefined workflow that consists of one or more jobs that can exist in a dependency relationship. Azkaban's official homepage is http://azkaban.github.io/azkaban2/, and its main features are the following:

    • Compatible with all Hadoop versions (1.X,2.X,CDH)
    • WebUI can be used for management configuration, easy to operate
    • Timed scheduling can be configured through the UI
    • Good extensibility to develop components for a problem (currently three plugins hdfsbrowser,jobtypeplugins and Hadoopsecuritymanager)
    • Have Rights Management module
    • You can track the flow or job execution through WebUI
    • You can set email reminders
    • You can configure the execution length control for a job in timed flow or flow, and if the execution time exceeds the set time, you can send a warning message to the person concerned or kill the corresponding flow or job
    • You can retry the failed job

Azkaban also has some limitations (yet to be mined), such as dependencies between tasks, not being able to specify partial completion (for example, we want task A to depend on B, but not B to fully execute a before it can be started, but a phase of B is completed to start a)

Azkaban is primarily a solution to the Hadoop job dependency, which consists of three components, as shown in the relationship between components

    • Relational Database (MySQL) storage Azkaban and job status information
    • Azkabanwebserver Management of project information, scheduled scheduling and monitoring through WebUI approach
    • Azkabanexecutorserver is responsible for the completion of job resolution and scheduling based on job dependencies.

Installation steps

First prepare Azkaban related software, where azkabanwebserver and azkabanexecutorserver to install to different directories, download links http://azkaban.github.io/azkaban2/downloads.html

    • Azkaban-web-server-2.1.tar.gz
    • Azkaban-executor-server-2.1.tar.gz
    • Azkaban-sql-script-2.1.tar.gz
    • Azkaban-hdfs-viewer-2.1.tar.gz
    • Azkaban-jobtype-2.1.tar.gz

Install and configure the database (MySQL only currently supported)

Azkaban using MySQL to manage engineering, planning, and execution

    • install MySQL specific installation process can refer to relevant information, such as http://ifalone.me/305.html ,http://dev.mysql.com/doc/index.html
    • Create a database for Azkaban, where the name of the database is not necessarily Azkaban
      mysql> create DB Azkaban;
    • the user who created the Azkaban database, where the user name is not necessarily Azkaban
      mysql> create user ' username ' @ '% ' identified by ' password ';
    • increase the Azkaban user to Azkaban database Add and delete permissions  
      mysql> GRANT select,insert,update,delete on <database>.* to ' <username> ' @ '% ' with GRANT OPTION;
    • if necessary, you can increase the packet size, which limits the packet size that MySQL accepts and can be configured in/ETC/MY.CNF
      [mysqld]...max_allowed_packet=1024m
    • After configuring restart MySQL
      sudo/sbin/service mysqld restart
    • Create Azkaba required database tables, unzip azkaban-sql-script-2.1.tar.gz, execute Create-all-sql scripts (where scripts with ' _update_ ' can be ignored)
    • Gets the JDBC connector mysql-connector-java-5.1.25.tar.gz,http://dev.mysql.com/downloads/connector/j/. Mysql-connector-java-5.1.25.tar.gz need to be webserver and Excutorserver after the installation, respectively copied into the azkaban2-web-server-install-dir/ Extlib and Supertool/azkaban/excutorserver/extlib

Download Install Web Server
    • Extract the azkaban-web-server-2.1.tar.gz to a suitable directory, such as Azkaban2-web-server-install-dir
      The following file directories should be available after extracting
      Folder Descriptionbin Run Azkaban Jetty Server script conf Azkaban Web server configuration file Lib Azkaban dependent jar package Extlib The jar package that is put into this directory will be added Azkaban classpathplugins plug-ins installed in this directory Web Azkaban Web server related css,html and other files
  • Gets the keystore required for SSL. Pay particular attention to using JAVA's Keytool tool, or you will get an error (PATH is best to put the new content in front of the old, such as Path= $JAVA _home/bin:......:${path})
    Certificate making can refer to http://wingware.iteye.com/blog/1160396, Http://docs.codehaus.org/display/JETTY/How+to+configure+SSL
    Keytool-keystore Keystore-alias Jetty-genkey-keyalg RSA
    Examples such as the following
    Keytool-keystore keystore-alias jetty-genkey-keyalg rsaenter keystore Password:passwordwhat is your first and last n  Ame?  [Unknown]: Jetty.mortbay.orgWhat is the name of the Your organizational unit?  [Unknown]: Jettywhat is the name of your organization?  [Unknown]: Mort Bay Consulting Pty. Ltd.what is the name of your city or Locality?  [Unknown]:what is the name of your state or province?  [Unknown]:what is the Two-letter country code for this unit?  [Unknown]:is cn=jetty.mortbay.org, Ou=jetty, O=mort Bay Consulting Pty. Ltd.,l=unknown, St=unknown, C=unknown correct? [No]: Yes Enter key password for <jetty> (RETURN if same as KeyStore password): password
    After successfully obtaining the KeyStore file, modify the following content in the Azkaban.properties file according to the actual situation
    Jetty.keystore=keystorejetty.password=passwordjetty.keypassword=passwordjetty.truststore= Keystorejetty.trustpassword=password
    • Configure the database. Modify the following content in the Azkaban2-web-server-install-dir/azkaban.properties file according to the actual situation
      database.type=mysqlmysql.port=3306mysql.host=localhostmysql.database=azkabanmysql.user=azkabanmysql.password= azkabanmysql.numconnections=100
    • Configure Usermanager. Configure Azkaban-users.xml related information according to the actual situation, in which a reference to Azkaban-users.xml is defined in Azkaban.properties
      User.manager.class=azkaban.user.xmlusermanager User.manager.xml.file=conf/azkaban-users.xml
    • Running Web Server
      The following properties are configured in azkaban.properties for jetty related behavior
      jetty.maxthreads=25jetty.ssl.port=8443
      A Web server temp directory, such as Azkaban2-web-server-install-dir/tmpdir, should be built first before running. Then go to the Webserver/bin directory and modify the following in the azkaban-web-start.sh
      Tmpdir=azkaban2-web-server-install-dir/tmpdir
      Enter the webserver root directory to execute the following command, no unexpected words start normally. You can verify that the startup was successful by https:/localhost:8443
      bin/azkaban-web-start.sh./
      Close the webserver command as follows:
      bin/azkaban-web-shutdown.sh./

Download Install Excutor Server
    • Extract the azkaban-executor-server-2.1.tar.gz to a suitable directory, such as Azkaban2-exec-server-install-dir
      The following file directories should be available after extracting
      Folder descriptionbin Start Azkaban jetty server script conf Azkaban exec server related profile Lib Azkaban dependent ja R Package Extlib put into this directory jar package will be added Azkaban classpathplugins plugin installed in this directory
    • Configure the database. Modify the following content in the Azkaban2-exec-server-install-dir/azkaban.properties file according to the actual situation
      database.type=mysqlmysql.port=3306mysql.host=localhostmysql.database=azkabanmysql.user=azkabanmysql.password= azkabanmysql.numconnections=100
    • Configuring Azabanwebserver and Azkabanexecutorserver Clients
      In Azkabanexecutorserver's azkaban.properties, make the following configuration:
      # Azkaban Executor settingsexecutor.maxthreads=50executor.port=12321executor.flow.threads=30
      In Azkabanwebserver's azkaban.properties, make the following configuration:
      executor.port=12321
      This configuration requires a restart of the server to take effect
    • Running Excutor Server
      A excutor server temp directory, such as Supertool/azkaban/excutorserver/tmpdir, should be built first before running. Then go to the Excutorserver/bin directory and modify the following in the azkaban-web-start.sh
      Tmpdir=supertool/azkaban/excutorserver/tmpdir
      Go to the Excutorserver root directory to execute the following command, no accident normal startup
      bin/azkaban-exec-start.sh./
      Close run the following command
      bin/azkaban-exec-shutdown.sh

Installing the Azkaban Plugin
  • HDFS Viewer Plugin
    Modify Azkaban2-web-server-install-dir/conf/azkaban.properties:
    Viewer.plugins=hdfs
    Azkaban will load the HDFs Viewer plugin from the following address:
    Azkaban2-web-server-install-dir/plugins/viewer/hdfs
    Extract the azkaban-hdfs-viewer-2.1.tar.gz to Azkaban2-web-server-install-dir/plugins/viewer and rename the directory to HDFs
    * If Hadoop does not have a security mechanism, restart Azkabanwebserver to use the HDFs plugin. If Hadoop starts the security mechanism, you need to modify the following configuration in the Azkaban2-web-server-install-dir/plugins/viewer/hdfs/conf/plugin.properties:
    parameter                           descriptionazkaban.should.proxy                wether Azkaban should proxy as another user to view the HDFs FileSystem, rather than Azkaban itself, defaults to truehadoop.security.manager.class      The security manager to is used, which handles talking to secure Hadoop cluster, defaults to Azkaban.security.HadoopSecuri TYMANAGER_H_1_0 (for Hadoop 1.x versions) proxy.user                          the Azkaban user configured With Kerberos and Hadoop. Similar to how Oozie should is configured, for secure Hadoop installationsproxy.keytab.location    & nbsp;         THe location of the keytab file with which Azkaban can authenticate with Kerberos for the specified proxy.user
  • JOB type plug-in
    Modify Azkaban2-exec-server-install-dir/conf/azkaban.properties:
    Azkaban.jobtype.plugin.dir=plugins/jobtypes
    Azkaban will load all job types plugins from the following address:
    Azkaban2-exec-server-install-dir/plugins/jobtypes
    Unzip the azkaban-jobtype-2.1.tar.gz to azkaban2-exec-server-install-dir/plugins/and rename the directory to Jobtypes
    If Hadoop does not have a security mechanism to boot, you only need to modify the following configuration in Azkaban2-exec-server-install-dir/plugins/jobtypes/commonprivate.properties:
    Parameter descriptionhadoop.home Your $HADOOP _home Setting.jobtype.globa L.classpath the cluster specific Hadoop resources, such as Hadoop-core jar, and Hadoop conf (e.g. ${hadoop.home }/hadoop-core-1.0.4.jar,${hadoop.home}/conf)
    If Hadoop starts the security mechanism, you need to modify the following configuration in Azkaban2-exec-server-install-dir/plugins/jobtypes/commonprivate.properties:
    parameter                            descriptionhadoop.security.manager.class        the security manager to being used, which handles talking to secure Hadoop cluster, defaults t o Azkaban.security.HadoopSecurityManager_H_1_0 (for Hadoop 1.x versions) proxy.user                            The Azkaban user configured with Kerberos and Hadoop. Similar to how Oozie should is configured, for secure Hadoop installationsproxy.keytab.location    & nbsp;          the location of the keytab file with which Azkaban can Authen Ticate with Kerberos for the specified proxy.userhadoop.home                          Your $HADOOP _home setting.jobtype.global.classpath            the cluster Specific Hadoop resources, such as Hadoop-core jar, and Hadoop con (e.g. ${hadoop.home}/hadoop-core-1.0.4.jar,${hadoop.ho me}/conf)

Instructions for use and examples. Job specific configurable information can be see http://azkaban.github.io/azkaban2/documents/2.1/jobconf.html
    • Create a simple job that can be scheduled
      Azkaban after launch can use the browser to access the site's 8443 port, after entering the site can be project additions and deletions and other related operations. Below to create a simple job example, such as creating a foo.job
      # Foo.jobtype=commandcommand=echo "Hello World"
      Compress the foo.job into a zip format. Then create the project on the Web page and upload the foo.zip to the Foo corresponding project, as shown in

      You can execute this project immediately after the configuration is complete, or you can set up timed execution, as shown in the timing configuration. The current time can only be entered in UTC and PDT two time zones, that is, according to the local time before the UTC time and then input (such as Cst-8=utc,github on the case has been listed as a bug, but not fixed).
    • Build a job with dependencies
      Build foo and bar two jobs respectively, where bar relies on Foo. Compressing two jobs into a single zip generates a simple flow project, where the action after Foo fails is optional, refer to HTTP://AZKABAN.GITHUB.IO/AZKABAN2/DOCUMENTS/2.1/ Executingflow.html.
#foo. Jobtype=commandcommand=echo Foo#bar.jobtype=commanddependencies=foocommand=echo Bar
    • to establish a Hadoopjava type of job, the main process is to first write the Java program into a jar package, then configure the **.job file, and finally compress the jar package and **.job into the zip file upload. An example of a packaged zip package with dependencies Http://redmine.mzsvn.com/attachments/download/398/java-hadooptest-de.zip
      First modify the Azkaban2-exec-server-install-dir/plugins/jobtypes/common.properties
      hadoop.home=hadoop.home=/home/ Workspace/hadoop-*.*.*
      and then modify Azkaban2-exec-server-install-dir/plugins/jobtypes/commonprivate.properties
      jobtype.global.classpath=${hadoop.home}/hadoop-core-*.*.*.jar,${hadoop.home}/conf,${hadoop.home}/lib/*
      An example job is as follows, where wc.properties (optional, not required) describes the variable information for this job, and Wordcount.job describes the primary configuration information for this job
      #wc. propertieshdfsroot=/ Testparam.indata=${hdfsroot}/inputparam.outdata=${hdfsroot}/output
      #wordcount. jobtype=hadoopjavajob.class= Azkaban.jobtype.examples.java.wordcountclasspath=./lib/*main.args=${param.indata} ${param.outData1} Force.output.overwrite=trueinput.path=${param.indata}output.path=${param.outdata}
    • The use of mail first needs to modify azkaban2-web-server-install-dir/conf/azkaban.properties, such as the following
      # Mail settingsmail.sender=****** @miaozhen. commail.host=smtp.miaozhen.commail.user=****** @miaozhen. commail.password=******
      You will then need to configure the appropriate mailing list for each job, as shown below
      # Foo.jobtype=commandcommand=echo "Hello World" notify.emails=****** @miaozhen. comfailure.emails=******@ miaozhen.comsuccess.emails=****** @miaozhen. com
    • SLA Usage for Azkaban
      Azkaban can set up SLA services for a job in timed flow or flow, and if the execution time exceeds the set time, you can send a warning message to the person concerned or kill the set flow or job, as shown for example.
    • Azkaban Interface Calls
      Azkaban provides an external Ajax interface that can be called by wrapping a GET or POST request. The API needs to get SessionID (the default is valid for one day) before you can do other things. Where you get the session to perform the job, such as the following
      Get session command: Curl-k--data "action=login&username=azkaban&password=azkaban" https://localhost:8443 Command returns result: { "Status": "Success", "Session.id": "5a932706-3d04-4c44-888d-5afcd87b8ebe"}
      Create a Project command: Curl-k--data "action=create&name=azkaban&description=dis&session.id= 5a932706-3d04-4c44-888d-5afcd87b8ebe "Https://localhost:8443/manager command returns the result: {" status ":" Success "," path ":" Manager? " Project=azkaaban "," Action ":" redirect "}
      Upload packed zip archive command: curl-k-i-h "Content-type:multipart/mixd"-X POST--form ' session.id= 5a932706-3d04-4c44-888d-5afcd87b8ebe '--form ' ajax=upload '--form ' [email protected];type=application/zip '--form ' Project=myproject;type/plain ' Https://localhost:8443/manager command return result: http/1.1 Continue http/1.1 : Application/jsoncontent-length:43server:jetty (6.1.26) {"ProjectID": "A", "Version": "1"}
      Execute flow command: Curl-k--data "ajax=executeflow&project=azkaban&flow=foo&session.id= 5a932706-3d04-4c44-888d-5afcd87b8ebe "Https://localhost:8443/executor command returns the result: {" message ":" Execution submitted Successfully with EXEC ID "," project ":" Azkaban "," Flow ":" foo "," Execid ": 70}

Azkaban of the Hadoop workflow engine [go]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.