Hive execution flowchart source code entry

Source: Internet
Author: User

Image address: http://hi.csdn.net/attachment/201107/29/0_1311922740tXqK.gif

Clidriver is the entry of hive, corresponding to the UI section. You can see its structure. The main () function! Yes! You guessed it was from main.
Is a class structure, with a total of five key functions.


This class can be said to be a platform for user interaction with hive. You can think of it as a hive client. There are a total of four key functions:
Is the role of the clidriver class in the whole hive execution process.

, Hive Execution Process _ follow the normal steps:
1.-Main () in clidriver. classz, initialize the hive environment variable and obtain the string or file provided by the client.
2-Send the code to processline (CMD). In this step, read all the strings before cmd: ';' (no check is performed. After reading, input processcmd () for processing
3-call processcmd (CMD) to handle the situation
//-Read the CMD file and process it according to the actual situation. There are five cases in total. Determine the processing method based on the starting string of the command.
// 1. Set... set operator parameters and hive Environment Parameters
// 2. Quit or exit-exit the hive Environment
// 3 .! Start
// Start with 4.dfs and submit it to fsshell for processing
// 5. hivesql statements are executed normally in hivesql. Here we are most concerned about it. The statement is handed over to the true core engine driver of hive. Returns ret = driver. Run (CMD );
4.-Different solutions for different situations. The fifth case we are concerned about: how to deal with normal hivesql? In fact, it is to enter the driver. Class to run (),
// Read hivesql, lexical analysis, and syntax analysis until execution ends
// 1. parsedriver returns the lexical tree commontree
// 2. basesemanticanalyzer Sem. Analyze (tree, CTX); // semantic explanation to generate an execution plan
5 .-... Etc

Today's topic is the entrance to hive. Let's talk about the first three steps.

Now let's refine the main functions to see how hive actually handles them. (If you only want to understand the hive workflow or principle and do not want to stick to the details, you can skip the following details. If you want to modify the source code and optimize it, you can continue to look at it)


The following are some key classes and functions involved in the hive entry.

----------- Clidriver-

This class can be used to implement the entire hive process architecture.

---------------- Main ()

Public static void main (string [] ARGs) throws ioexception {optionsprocessor oproc = new optionsprocessor (); If (! Oproc. process_stage1 (ARGs) {system. exit (1);} // Note: It is critical to do this here so that log4j is reinitialized before // any of the other core hive classes are loaded sessionstate. inithivelog4j (); // create the sesssion clisessionstate Ss = new clisessionstate (New hiveconf (sessionstate. class); SS. in = system. in; // standard input try {ss. out = new printstream (system. out, true, "UTF-8 ");//?? SS. Err = new printstream (system. Err, true, "UTF-8 ");//??} Catch (unsupportedencodingexception e) {system. Exit (3);} sessionstate. Start (SS); // -- start session creates a new sessionstate if (! Oproc. process_stage2 (SS) {system. exit (2);} // set all properties specified via command line hiveconf conf = ss. getconf (); // set all configuration attributes for (map. entry item: Ss. using properties. entryset () {Conf. set (string) item. getkey (), (string) item. getvalue () ;}sp = new setprocessor ();//?? What is proccessor QP = new driver (); // normal hivesql Processing Engine DFS = new fsshell (ss. getconf (); // DFS interface, used to process if(ss.exe cstring! = NULL) {// enter the command line and run system.exit(processline(ss.exe cstring);} Try {If (ss. filename! = NULL) {// enter the file name. Read the file and run system. exit (processreader (New bufferedreader (New filereader (ss. filename);} catch (filenotfoundexception e) {// the file system is not found. err. println ("cocould not open input file for reading. ("+ E. getmessage () + ")"); system. exit (3);} character mask = NULL; string trigger = NULL; consolereader reader = new consolereader (); // hive Console Command reader. setbellenabled (false); // Reader. Setdebug (New printwriter (New filewriter ("writer. debug ", true); List completors = new completors list (); completors. add (New simplecompletor (New String [] {"set", "from", "CREATE", "LOAD", "describe", "quit ", "Exit"}); reader. addcompletor (New argumentcompletor (completors); string line; printwriter out = new printwriter (system. out); final string historyfile = ". hivehistory "; // creates a history file and records all command line strings. Historyfile = system. getproperty ("user. home ") + file. separator + historyfile; reader. sethistory (New History (new file (historyfile); int ret = 0; log = logfactory. getlog ("clidriver"); // create a log loghelper console = new loghelper (log); string prefix = ""; string curprompt = prompt; // -- is "hive" // continuously obtain hivesql, read; all previous content, pass processline to process while (line = reader. readline (curprompt + "> "))! = NULL) {// -- long start = system. currenttimemillis (); // The command starts Timing If (line. trim (). endswith (";") {// If ';' is met, the line = prefix + "" + line; ret = processline (line); // ---- important: pass the command line to the parsing and execute prefix = ""; // reset the prefix to null curprompt = prompt; // "hive"} else {prefix = prefix + line; curprompt = prompt2; // it should be "" continue;} long end = system. currenttimemillis (); If (end> Start) {// statistics start to end time, for example The time used. // you can add many screen operations to the console reader. Double timetaken = (double) (end-Start)/1000.0; console. printinfo ("time taken:" + timetaken + "seconds", null); // corresponds to hive session.} System. Exit (RET );

---------- Processline (CMD)
// Read all strings before cmd: ';' (no check is performed. After reading, input processcmd for processing.

Public static int processline (string line) {int ret = 0; For (string onecmd: line. split (";") {onecmd = onecmd. trim (); If (onecmd. equals ("") continue; ret = processcmd (onecmd); // -- execute the command if (Ret! = 0) {// ignore anything after the first failed command return ret;} return 0 ;}

---------- Processcmd ()
//-Read the CMD file and process it according to the actual situation. There are five cases in total. Determine the processing method based on the starting string of the command.
// 1. Set... set operator parameters and hive Environment Parameters
// 2. Quit or exit-exit the hive Environment
// 3 .! Start
// Start with 4.dfs and submit it to fsshell for processing
// 5. hivesql statements are executed normally in hivesql. Here we are most concerned about it. The statement is handed over to the true core reference of hive.

Public static int processcmd (string cmd) {string [] tokens = cmd. split ("\ s +"); string 1__1 = cmd. substring (tokens [0]. length (); int ret = 0; If (tokens [0]. equals ("set") {// 1 ret = sp. run (statement _1); // call this statement to modify hadoop configuration} else if (CMD. equals ("quit") | cmd. equals ("exit") {// 2 // exit the hive environment system. exit (0);} else if (CMD. startswith ("! ") {// 3 :! Sessionstate Ss = sessionstate. get (); string shell_cmd = cmd. substring (1); If (shell_cmd.endswith (";") {shell_cmd = shell_cmd.substring (0, shell_cmd.length ()-1) ;}// -- remove ';'?? // Shell_cmd = "/bin/bash-C \ '" + shell_cmd + "\'"; try {process executor = runtime.getruntime(cmd.exe C (shell_cmd );//!!?? This sentence must be followed by streamprinter outprinter = new streamprinter (executor. getinputstream (), null, SS. out); streamprinter errprinter = new streamprinter (executor. geterrorstream (), null, SS. err); outprinter. start (); errprinter. start (); int exitval = executor. waitfor ();//?? Look executor if (exitval! = 0) {ss. err. write (new string ("command failed with exit code =" + exitval )). getbytes () ;}} catch (exception e) {e. printstacktrace () ;}} else if (CMD. startswith ("DFS") {// 4 "DFS" Start Parsing Method -- cmd. // hadoop DFS operation interface processing! Sessionstate Ss = sessionstate. get (); If (DFS = NULL) DFS = new fsshell (ss. getconf (); string hadoopcmd = cmd. replacefirst ("DFS \ s +", ""); hadoopcmd = hadoopcmd. trim (); If (hadoopcmd. endswith (";") {hadoopcmd = hadoopcmd. substring (0, hadoopcmd. length ()-1);} string [] ARGs = hadoopcmd. split ("\ s +"); // try {printstream oldout = system. out; system. setout (ss. out); int val = DFS. run (ARGs );//?? System. setout (oldout); If (Val! = 0) {ss. err. write (new string ("command failed with exit code =" + val )). getbytes () ;}} catch (exception e) {ss. err. println ("exception raised from dfsshell. run "+ E. getlocalizedmessage () ;}} else {// 5 hivesql runs normally, with emphasis on ret = QP. run (CMD); // run the hive command normally, for example, select ..; addFile ..; vector res = new vector (); While (QP. getresults (RES) {// get the execution result for (string R: Res) {sessionstate Ss = sessionstate. get (); printstream out = ss. out; out. println (r);} res. clear ();} int cret = QP. close (); If (ret = 0) {ret = cret;} return ret ;}

-------- Class clisessionstate

Apart from initialization, clisessionstate basically inherits the implementation method of sessionstate, which may be due to low coupling.
Public class clisessionstate extends sessionstate


So let's look at sessionstate directly.
Sessionstate can be said to be your own current hive environment. It creates and initializes your hive session. On the one hand, it comes from the conf initialization settings, and on the other hand, it comes from your manual set. You can use the command line or file, depending on your choice.
It connects to the hive metadata database to obtain the existing metadata information.
Key features:
1 is mainly to generate a session and assign a unique ID (Set Rule: user_id + "_" + yyyymmddhhmm) to the session,
There are two ways to generate a session: 1. directly create a session and 2. Copy a session.
The first one is most commonly used, but the second one is very useful. we can ensure that our two environments are completely consistent and avoid trivial setup work.
2. Give each cmd A queryid. You can use queryid to obtain the command line, or you can obtain the ID in turn.
3. Each sessionstate has a loghelper for logging.
Specifically, clude: hiveconf connects to the DB metadata database.

Is the class structure of sessionstate:

Key functions:
String makesessionid () --- // generate sessionid: user_id + "_" + yyyymmddhhmm
Setcmd (string consumer string) -- // set the query ID for command cmd
Protected final static hiveconf. confvars [] metavars -- // obtain the metadata system and path.
Public String getcmd () -- //-get the command code cmd through queryid

----------- Commandprocessor class
Commandprocessor class is very simple, it is an interface class.

Public interface commandprocessor {
Public int run (string command );
}

You may wonder why we talk about this interface class first, because there are three classes that implement this interface (for example). setprocessor and metadataprocessor are key classes in our hive portal.

---- Metadataprocessor
Extraction and processing of some hive metadata
// Obtain the metadata of the table in run (). If an error occurs, 1 is returned. For example, the table name cannot be found.

Public int run (string command) {sessionstate Ss = sessionstate. get (); string table_name = command. trim (); If (table_name.equals ("") {return 0 ;}try {metastoreclient MSC = new metastoreclient (ss. getconf (); If (! MSC. tableexists (table_name) {// The table does not exist ss. err. println ("table does not exist:" + table_name); return 1;} else {list fields = MSC. get_fields (table_name); // obtain table information for (fieldschema F: fields) {ss. out. println (F. getname () + ":" + F. getType () ;}} catch (metaexception ERR) {ss. err. println ("got meta exception:" + err. getmessage (); return 1;} catch (exception ERR) {ss. err. println ("got exception:" + err. getmessage (); return 1;} return 0 ;}

------- Setprocessor class
It mainly sets up the hive environment, which is divided into two categories:
1. Set session to safe mode,
For example, set silent = true;
2. Set the conf configuration of the session, that is, the configuration parameters for calling hadoop and the specific implementation for changing the execution. For example, set hive.exe C. Compress. Output = 'false ';
// You can call run (string command) to modify hadoop configuration and hive execution parameters,

Public int run (string command) {sessionstate Ss = sessionstate. get (); // create a sessionstate object string nwcmd = command. trim (); // trim the space if (nwcmd. equals ("") {dumpoptions (ss. getconf (). getchangedproperties (); Return 0;} If (nwcmd. equals ("-V") {dumpoptions (ss. getconf (). getallproperties (); Return 0;} string [] Part = new string [2]; int eqindex = nwcmd. indexof ('='); If (eqindex =-1) {// No duplicate ity sign-print the property out dumpoption (ss. getconf (). getallproperties (), nwcmd); Return (0);} else if (eqindex = nwcmd. length ()-1) {part [0] = nwcmd. substring (0, nwcmd. length ()-1); part [1] = "";} else {part [0] = nwcmd. substring (0, eqindex); // set cmd part [1] = nwcmd separated by = in the middle. substring (eqindex + 1);} Try {// If (part [0]. equals ("silent") {// sets the silent mode Boolean val = getboolean (part [1]); // ss. setissilent (VAL); //} else {ss. getconf (). set (part [0], part [1]); // set key-value (for example :. GMT = ture) modify the configuration in the conf file of the Session}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.