2. Impala source code analysis
Reference: http://www.sizeofvoid.net/wp-content/uploads/ImpalaIntroduction2.pdf
This chapter begins the source code analysis stage. The reference link is a very good introduction to Impala implementation and running process. Thank you for the author.
2.1 Impala internal architecture
The internal architecture of impala is as follows:
Figure 2-1 Impala internal architecture
We can see that Impala has three parts: client, impalad, and statestore.
Components |
Description |
Client |
As shown in the figure, there are three types of clients: the thrift client, which is used to submit queries and connect to port 21000 of impalad. |
Impalad |
There are two sections of frontend and backend, including three thrift servers (beeswax-server, hs2-server, be-server) |
Statestore |
Each impalad registers with it, And then updates the status of other nodes in the cluster to each impalad. |
The following table describes the ports of the impalad component:
Attribute |
Value |
Description |
|
Impalad component Port |
|
Impala background program backend Port Be_port |
22000 Default Value |
The port exported by impalabackendservice. |
Impala daemon beeswax Port Beeswax_port |
21000 Default Value |
Impala daemon sends a request to the beeswax client to provide the port used by the Service. |
Impala daemon hiveserver2 Port Hs2_port |
21050 Default Value |
Impala daemon sends a request to the hiveserver2 client to provide the port used by the Service. |
Statestoresubscriber service port State_store_subscriber_port |
23000 Default Value |
The port on which statestoresubscriberservice runs. |
|
|
|
|
Statestore component Port |
|
Statestore service port State_store_port |
24000 Default Value |
The port exported by statestoreservice. |
Statestore HTTP server port Webserver_port |
25010 Default Value |
Statestore debug the port on which the website server runs. |
Beeswax_port = 21000 is the port used to provide services to the beeswax client, which is used by the hue client, JDBC, and Impala-shell clients; hs2_port = 21050 is used to provide services to the hiveserver2 client; be_port = 22000 is used to interact with other internal impalad processes; state_store_subscriber_port = 23000 is the port used to register itself with the statestated process and update the status. Port 24000 in the statestore component is used to interact with port 23000 of impalad. Other ports are not very important, we will not introduce it.
The overall code file structure is as follows:
2.2 impalad code analysis 2.2.1 Impalad-main.cc
16 // This file contains the main () function for the impala daemon process, 17 // which exports the thrift services impalaservice and impalainternalservice. 18 19 # include <unistd. h> 20 # include <JNI. h> 21 22 # include "common/logging. H "23 # include" common/init. H "24 # include" exec/hbase-table-scanner.h "25 # include" exec/hbase-table-writer.h "26 # include" RunTime/hbase-table-factory.h "27 # include "Codegen/llvm-codegen.h" 28 # include "common/status. H "29 # include" RunTime/coordinator. H "30 # include" RunTime/exec-env.h "31 # include" util/jni-util.h "32 # include" util/network-util.h "33 # include" rpc/thrift-util.h "34 # include" rpc/ thrift-server.h "35 # include" rpc/rpc-trace.h "36 # include" service/impala-server.h "37 # include" service/fe-support.h "38 # include" gen-CPP/impalaservice. H "39 # Include "gen-CPP/impalainternalservice. H "40 # include" util/impalad-metrics.h "41 # include" util/thread. H "43 using namespace Impala; 44 using namespace STD; 45 46 declare_string (classpath); 47 declare_bool (use_statestore); 48 declare_int32 (beeswax_port); 49 declare_int32 (hs2_port ); 50 declare_int32 (be_port); 51 declare_string (principal); 52 53 int main (INT argc, char ** argv) {54 initcommonrun Time (argc, argv, true); // enable log parsing. Based on Google gflags and glog 55 56 llvmcodegen: initializellvm (); 57 jniutil: initlibhdfs (); // initialize JNI, because Fe is developed by Java in 58 exit_if_error (hbasetablefactory: Init (); 59 exit_if_error (hbasetablefactory: Init (); 60 exit_if_error (hbasetablewriter :: initjni (); 61 initfesupport (); 62 63 // start backend service for the Coordinator on be_port 64 execenv exec_env; // execenv is Query /Paln-fragment execution environment 65 startthreadinstrumentation (warn (), exec_env.webserver (); 66 initrpceventtracing (exec_env.webserver (); 67 68 thriftserver * beeswax_server = NULL; 69 thriftserver * hs2_server = NULL; 70 thriftserver * be_server = NULL; // These are three thriftservers. The original service client and other impalad backend 71 impalaserver * Server = NULL; // This server packs the above three thriftservers to provide external services 72 exit_if_error (createimpalas Erver (& exec_env, flags_beeswax_port, flags_hs2_port, 73 flags_be_port, & beeswax_server, & hs2_server, & be_server, & server )); // create impalaserver 74 75 exit_if_error (be_server-> Start (); // start be_server 76 77 status = exec_env.startservices (); // start service, including statestore_subscriber (used to register with the statestod process) 78 If (! Status. OK () {79 log (error) <"impalad services did not start correctly, exiting. error: "80 <status. geterrormsg (); 81 shutdownlogging (); 82 exit (1 ); 83} 84 85 // This blocks until the beeswax and hs2 servers terminate 86 exit_if_error (beeswax_server-> Start (); 87 bytes (hs2_server-> Start (); 88 impaladmetrics:: impala_server_ready-> Update (true); 89 log (Info) <"impala has started. "; 90 beeswax_server-> join (); // blocking wait for beeswax-server to exit before executing the following statement 91 hs2_server-> join (); // blocking wait for the hs2-server to exit before continuing to execute the following statement 92 93 Delete be_server; 94 Delete beeswax_server; 95 Delete hs2_server; 96}
To be continued...
Impala source code analysis --- 1