A beautiful encounter with Thrift and thrift
I. First knowledge of Thrift
Maybe most people access Thrift from serialization. Every time you search for keywords such as "java serialization" + "Mode", "comparison", or "performance, search engines always return a lot of results about the use of methods or performance comparison for various serialization methods, and Thrift must be indispensable, and its performance is not bad, at least it is much better than java Native serialization, which has only 1 role (well, forgive me for my little emotions ......).
However, I initially started Thrift from a company project.
At this time last year, my business department found that small advertisements in several UGC communities were particularly serious, boss requires all communities to be connected to the company's Rich Media Monitoring System (responsible for reviewing and penalizing all the company's business content, hereinafter referred to as the Monitoring System ), to implement quasi-real-time reporting of UGC content (including text, images, audio and video, as well as UserInfo such as user portraits and nicknames) and automatic processing of junk information (such as clearing the site and banned accounts ). Abstract The Access System between the business system and the monitoring system based on the principle of minimum intrusion, function reuse, and unified process of business services, uniformly responsible for receiving, reporting, repushing, searching, result query, and forwarding of penalty commands of the monitoring system. The business can be simply abstracted as 1.1:
Fig 1.1
Because the monitoring system uses Thrift to provide services, the interaction between the Access System and the monitoring system uses the Thrift protocol. Considering the convenience of access, business systems can use Thrift and Http to interact with the access system.
At that time, I was responsible for this project by myself. Because Thrift had a poor understanding and the project time was short, the project was very busy in general, at first, I thought it was difficult to complete Thrift within the specified time, but it was quite convenient to develop Thrift. The system was launched on time, so far there have been no moths. Later, I learned Thrift and thought it was a required skill.
Okay, now it's a close relationship with Thrift.
II. The so-called RPC
Before learning about Thrift, let's take a look at what RPC is ).
Let's take a look at the following Chestnuts:
public void invoke(){ String param1 = "my String 1"; String param2 = "my String 2"; String res = getStr(param1, param2); System.out.println("res=" + res)}private String getStr(String str1, String str2){ return str1 + str2;}
This is the simplest code for calling local functions. Both the caller and the called party are in the same program and are called in the process.When the CPU executes the call, it switches back to execute the called function. After the execution, it switches back to execute the subsequent code. For the caller, the execution of the called function is blocked (non-asynchronous) until the execution of the called function is completed. Process 2.1
Fig 2.1
Next, let's look at the RPC call:
public void test(){
TestQry.Client client = getClient("192.168.4.222", 7800, 5000);
String param1 = "my String 1";
String param2 = "my String 2";
String res = client.getStr(param1, param2);
System.out.println("res=" + res);
}
private TestQry.Client getClient(String ip, int port, int timeOut) throws Exception{
TSocket tSocket = new TSocket();
TTransport transport = new TFramedTransport(tSocket);
tTransport.open();
TProtocol protocol = new TBinaryProtocol(tTransport);
return new TestQry.Client(protocol);
}
This is an inter-process call. The caller and the called party are not in the same process (or even different servers or data centers ).Inter-process calls need to transmit data through the network. When executing an RPC call, the caller will be blocked (non-asynchronous) until the call result is returned. Process 2.2
Fig 2.2
To put it simply, RPC is a way to request services from remote computer programs through the network, which makes it easier to develop applications including distributed network applications.
3. Not just a serialization Tool
Thrift was initially developed by Facebook as a scalable and cross-language software framework for RPC communication between languages in the system. It combines powerful software stacks and code generation engines, allows you to define data types and service interfaces in a simple definition file as an input file. The Compiler generates code to easily generate a seamless cross-programming language for RPC client-to-server communication.
Thrift is a specific implementation of IDL descriptive language. It is suitable for static data exchange between programs. You must first determine the data structure.Thrift is completely static. When the data structure changes, you must re-edit the IDL file, code generation, and then compile and load the process. Compared with other IDL tools, Thrift is considered a weakness.Thrift is a common tool for building large-scale data exchange and storage. It has obvious advantages over JSON and XML in terms of performance and transmission size in large systems.
Note that Thrift is not only an efficient serialization tool, but also a complete RPC framework!
3.1 stack structure
As shown in 3.1, Thrift contains a complete stack structure for building clients and servers.
Fig 3.1
The code framework layer is the client-side and server-side code framework generated according to the service interface description file defined by Thrift. The data read/write operation layer is the code generated based on the Thrift file to implement data read/write operations.
3.2 client/server call Process
First, let's take a look at how the Thrift server starts and provides services, as shown in 3.2 (Click here to see the big picture ):
Fig 3.2
This is the Startup Process of HelloServiceServer and the server response process when the service is called by the client. We can see that after the program calls the TThreadPoolServer's serve () method, the server enters the blocking listening state, and the blocking is on the TServerSocket's accept () method. After receiving a message from the client, the server initiates a new thread to process the message request, and the original thread becomes congested again. In the new thread, the server reads the message content through the TBinaryProtocol, calls the helloVoid () method of HelloServiceImpl, and writes the result to helloVoid_result to send it back to the client.
After the service is started, the client starts to call its service, as shown in Figure 3.3 (Click here to see the big picture ):
Fig 3.3
It shows the process of HelloServiceClient calling the service and receiving the post-processing results returned by the server. We can see that the program calls Hello. the helloVoid () method of the Client. In the helloVoid () method, the send_helloVoid () method is used to send a call request to the service. The recv_helloVoid () method is used to receive the result returned after the service processes the request.
3.3 Data Type
In the previous section, we have a general understanding of the Thrift server and client workflow. Now let's talk about the data types that Thrift can define. Thrift supports several types of data structures: basic types, struct and exception types, container types, and service types.
Basic Type:
Bool: Boolean value (true or false), one bytebyte: signed byte i16: 16-bit signed integer i32: 32-bit signed integer i64: 64-bit signed integer double: 64-bit floating-point string: Unknown encoded or binary string
Struct and exception type:
The Thrift struct (struct) is similar to the C-language struct type in concept. In java, the Thrift struct is converted into an object-oriented class. Struct is defined as follows:
struct UserDemo { 1: i32 id; 2: string name; 3: i32 age = 25; 4: string phone;}
Struct has the following features:
1. struct cannot inherit, but can be nested. It cannot nest itself.
2. its members all have clear types. 3. the Member is numbered by a positive integer, and the number of the member cannot be repeated. This is used for encoding during transmission (For details, refer to Note 1) 4. the member delimiters can be commas (,) or semicolons (;), and can be mixed. However, to ensure clarity, we recommend that you use only one of them in the definition, for example, java learners can use commas (;) 5. the field is divided into optional and required (For details, refer to Note 2) 6. you can set the default value for each field 7. the same file can define multiple struct, or be defined in different files for include introduction.
Note 1:Numeric tags play a very important role. As the development of the project continues, fields may change. However, it is recommended that you do not modify these numeric tags easily, after the modification, if the client and server are not synchronized, A parsing problem occurs.
NOTE 2:For the struct field type, each field in the standard struct definition uses the required or optional keyword for identification, but if it is not specified, it is non-type and this value can be left blank, but it will also be serialized during serialization and transmission. Optional is not filled, but not serialized. required is required and must be serialized. If the required field is not assigned a value, Thrift will prompt; if the optional field is not assigned a value, the field will not be serialized and transmitted; if an optional ID domain has a default value and the user has not re-assigned a value, the value of this domain is always the default value. If an optional ID domain has a default value or the user has re-assigned a value, without setting its _ isset to true, it will not be serialized for transmission.
An exception is equivalent to a struct in syntax and function. The difference is that the keyword exception is used instead of struct declaration. It is semantically different from struct: when defining an RPC service, developers may need to declare a remote method to throw an exception.
Container Type
Thrift containers correspond to the container types of popular programming languages. There are three available container types:
List <t>: an ordered table with the element type t, allowing repeated elements. Corresponding to the ArrayListset of java <t>: unordered table with the element type t, and element duplication is not allowed. Corresponding to java HashSetmap <t, t>: kv pairs whose key type is t and value type is t. Duplicate keys are not allowed. For Java HashMap
The element type in the container can be any legal Thrift type except service (including struct and exceptions ).
Service Type
The service definition method is semantically equivalent to the interface in the object-oriented language. The Thrift compiler generates client and server stubs that execute these interfaces (details will be described in the next section ). Here is a simple example to explain how to define a service:
Service QuerySrv {/*** this method finds the corresponding user information based on the name and age */UserDemo qryUser (1: string name, 2: i32 age );
/*** This method finds the mobile phone number of the corresponding user based on the id */string queryPhone (1: i32 id );}
In the preceding example, we define a structure of the service type, which contains two methods.
When defining services, we also need to understand the rules:
1. the inherited class must implement these methods. 2. parameters can be basic types or struct 3. all parameters are of the const type and cannot be returned. 4. the returned value can be void (the value returned by oneway must be void. the service supports inheritance. One service can use the extends keyword to inherit another service6. the service does not support overloading.
In addition to the four data types mentioned above, Thrift also supports the enum and const types ).
Namespace
Namespaces in Thrift are similar to packages in java, which provide a simple way to organize (isolate) code. The namespace can also be used to resolve name conflicts in the type definition.
3.4 Transmission System
Transmission Protocol
Thrift supports multiple transmission protocols. You can select an appropriate type based on your actual needs. Thrift transmission protocols can be divided into two types: text and binary, generally, most binary transfer protocols are used in the production environment (which is more efficient than text and JSON ). Common protocols include:
1. TBinaryProtocol: it is the default protocol of Thrift. It transmits data in binary encoding format and basically directly sends original data. TCompactProtocol: a compressed and intensive data transmission protocol. Based on Variable-length quantity, the zigzag encoding format is 3. TJSONProtocol: JSON (JavaScript Object Notation) data encoding protocol for data transmission 4. TDebugProtocol
If you want to learn more about the implementation and working principles of the above types of transmission protocols, refer to the thrift source code study.
Transmission Mode
Like the transmission protocol, Thrift also supports several different transmission modes.
1. TSocket:A blocking socket is used for the client to read and write data using system functions read and write.
2. TServerSocket:Non-blocking socket is used on the server side. The socket type to accecpt is TSocket (Block socket ).
3. TBufferedTransportAndTFramedTransportAll are cached and inherit from TBufferBase. Do I call the next TTransport class to perform read and write operations? The structure is extremely similar. Here, TFramedTransport takes frame as the transmission unit, and the frame structure is: 4 bytes (int32_t) + transmission byte string. The first 4 bytes are the length of the byte string after the storage, this byte string is the correct data to be transmitted. Therefore, TFramedTransport transmits four more bytes per frame than TBufferedTransport and TSocket.
4. TMemoryBufferInherit from TBufferBase, used for internal program communication and does not involve any network I/O. It can be used in three modes: (1) OBSERVE mode, data cannot be written to cache; (2) TAKE_OWNERSHIP mode, it is responsible for releasing the cache; (3) COPY the external memory block to TMemoryBuffer in COPY mode.
5. TFileTransportDirectly inherit TTransport, used to write data to a file. Write data in the form of an event. The main thread is responsible for columns the event, the write thread is responsible for columns the event, and the data in the event is written to the disk. Two queues are used here, with the TFileTransportBuffer type. One is used for writing events in the main thread and the other is used for writing read events in the thread, which avoids thread competition. After reading the queue events, the queue will be exchanged. Because two pointers direct to the two queues, only the pointer can be exchanged. It also supports writing data to a file in the form of chunk (Block.
6. TFDTransportIt is very simple to write data to files and read data from files. Its write and read functions directly call the system functions write and read to write and read files.
7. TSimpleFileTransportInherit TFDTransport directly without adding any member functions and member variables, the difference is that the constructor parameters and the parent class are initialized in the TSimpleFileTransport Constructor (open the specified file and pass fd to the parent class and set the close_policy of the parent class to CLOSE_ON_DESTROY ).
8. TZlibTransportLike TBufferedTransport and TFramedTransport, the next TTransport class is called for read and write operations. It uses <zlib. h> the zlib compression and decompression library functions are provided for compression and contraction. When writing, compression is performed first before the underlying TTransport class sends data. When reading, the TTransport class receives data before decompression, finally, it is processed by the upper layer.
9. TSSLSocketInherits TSocket and blocks socket for clients. Use openssl interfaces to read and write data. The checkHandshake () function calls SSL_set_fd to bind fd and ssl together. Then, you can read and write network data through the SSL_read and SSL_write interfaces of ssl.
10. TSSLServerSocketInherit from TServerSocket, non-blocking socket, used on the server side. The socket type from accecpt is TSSLSocket.
11. THttpClientAndTHttpServerIt is an inherited Transport type based on the Http1.1 protocol and inherits THttpTransport. THttpClient is used for the client and THttpServer is used for the server. Both of them call the next TTransport class for read and write operations. TMemoryBuffer is used as the Read and Write cache. Only the flush () function can be called to send data that actually calls the network I/O interface.
TTransport is the parent class of all Transport classes. It provides a unified interface for the upper layer and can access different sub-classes through TTransport, similar to polymorphism.
4. Select the java server Art
Thrift consists of three main components: protocol, transport, and server.Protocol defines how messages are serialized, transport defines how messages communicate between the client and the server, and server receives serialized messages from transport, according to protocol deserialization, call the user-defined message processor, serialize the response of the message processor, and then write them back to transport.Thrift's modular structure enables it to provide various server implementations. The available server implementations in Java are listed below:
1. TSimpleServer2. TNonblockingServer3. THsHaServer4. TThreadedSelectorServer5. TThreadPoolServer
It is good to have multiple choices, but if you do not know the difference, it is a disaster. So next we will talk about the differences between these servers and use some simple tests to illustrate their performance characteristics.
TSimpleServer
TSimplerServer accepts a connection and processes the connection request. It does not return to accept a new connection until the client closes the connection. Because it only blocks I/O in a single thread, it can only serve one client connection, all other clients can only wait until they are accepted by the server.TSimpleServer is mainly used for testing purposes. Do not use it in the production environment!
TNonblockingServer vs. THsHaServer
TNonblockingServer uses non-blocking I/O to solve the problem that a client of TSimpleServer blocks all other clients. It uses java. nio. channels. Selector. By calling select (), it causes you to block multiple connections rather than a single connection. When one or more connections are ready to be accepted, read, or written, the select () call will return. When TNonblockingServer processes these connections, it either accepts or reads data from it, or writes the data to it, and then calls select () again to wait for the next available connection. In this way, the server can serve multiple clients at the same time, without the case that one client will "Starve" all other clients.
However, there is another tricky problem: All messages are processed by the same thread that calls the select () method. Suppose there are 10 clients, and the time required to process each message is 100 milliseconds. What is latency and throughput? When a message is processed, the other nine clients are waiting for the select statement. Therefore, the client needs to wait 1 second to receive a response from the server. The throughput is 10 requests/second. It would be nice if I could process multiple messages at the same time?
Therefore, THsHaServer (semi-synchronous/semi-Asynchronous server) emerged. It uses a separate thread to process network I/O, and an independent worker thread pool to process messages. In this way, as long as there are idle worker threads, messages will be processed immediately, so multiple messages can be processed in parallel. In the preceding example, latency is 100 milliseconds, while throughput is 100 requests/second.
In order to demonstrate a test, there were 10 clients and a modified message processor-its function was simply to sleep 100 milliseconds before returning. The THsHaServer with 10 worker threads is used. The message processor code looks like the following:
public ResponseCode sleep() throws TException{ try { Thread.sleep(100); } catch (Exception ex) { } return ResponseCode.Success;}
(In particular, the test results in this section are taken from off-site articles. For more information, see the link at the end of this Article)Fig 4.1
Fig 4.2
As we can imagine, THsHaServer can process all requests in parallel, while TNonblockingServer can only process one request at a time.
THsHaServer vs. TThreadedSelectorServer
Thrift 0.8 introduces another server implementation, namely, TThreadedSelectorServer. The main difference between it and THsHaServer is that TThreadedSelectorServer allows you to use multiple threads to process network I/O. It maintains two thread pools, one for processing network I/O and the other for processing requests. When network I/O is a bottleneck, TThreadedSelectorServer performs better than THsHaServer. To demonstrate their differences, perform a test, so that the message processor returns immediately without any operation, to measure the average latency and throughput of different clients. For THsHaServer, 32 worker threads are used; for TThreadedSelectorServer, 16 worker threads and 16 selector threads are used.
Fig 4.3
Fig 4.4
The results show that the TThreadedSelectorServer throughput is much higher than the THsHaServer throughput and maintains at a lower latency.
TThreadedSelectorServer vs. TThreadPoolServer
Finally, TThreadPoolServer is left. The TThreadPoolServer is different from the other three servers:
1. There is a dedicated thread to accept connections 2. Once a connection is accepted, it will be processed in a worker thread in ThreadPoolExecutor. 3. The worker thread is bound to a specific client connection until it is closed. Once the connection is closed, the worker thread returns to the thread pool. 4. You can configure the minimum and maximum number of threads in the thread pool. The default values are 5 (minimum) and Integer. MAX_VALUE (maximum ).
This means that if there are 10 thousand concurrent client connections, you need to run 10 thousand threads. Therefore, the consumption of system resources is not as "friendly" as other types of servers ". In addition, if the number of clients exceeds the maximum number of threads in the thread pool, requests will be blocked until a worker thread is available.
We have already said that TThreadPoolServer has excellent performance. On the computer I'm using, it can support 10 thousand concurrent connections without any problems. If you know the number of clients that will be connected to your server in advance and you don't mind running a large number of threads, TThreadPoolServer may be a good choice for you.
Fig 4.5
Fig 4.6
I think you can make a decision on which Thrift server suits you.TThreadedSelectorServer is a secure choice for most cases. If your system resources allow running a large number of concurrent threads, we recommend that you use TThreadPoolServer.
V. Let's do it
I have already introduced a lot of theoretical knowledge. Many people still don't know how to use it! Okay, it's time to show the real technology (LOL ...).
The simplest code is the most beautiful code. As long as the function is powerful, the simplest Code cannot mask its outstanding temperament. The following describes how to use the powerful Thrift code generation engine to generate java code and call Thrift Server and Client through detailed steps.
Note: Based on Thrift-0.9.2, this document ignores non-critical codes such as log processing during the process.
Step 1:First download the corresponding Window platform compiler from the official website (click to download the thrift-0.9.2.exe ). Create a. thrift file using the IDL Description Language. This article provides a test case to implement simple functions, as shown below:
/*** The file name is TestQry. thrift * Implementation function: Create a query result struct and a service interface * based on: thrift-0.9.2 **/namespace java com. thriftstruct QryResult {/*** return code, 1 successful, 0 failed */1: i32 code;/*** Response Message */2: string msg ;} service TestQry {/*** test query interface. When the qryCode value is 1, the system returns the "success" Response Message, if the qryCode value is another value, the system returns the Response Message "failed" * @ param qryCode test parameter */QryResult qryTest (1: i32 qryCode )}
Step 2:Put the above TestQry. thrift file and the thrift-0.9.2.exe in the same directory, as shown below:
Fig 5.1
Enter the directory of the file directory in the command prompt CMD and run the Code Generation Command:
thrift-0.9.2.exe -r -gen java TestQry.thrift
After execution, we can see the generated java code in the folder.
Fig 5.2
Step 3:Next, create a Maven Project (JDK 1.5 or later), copy the code generated in the previous step to the Project, and load the dependencies of Thrift in pom. xml, as shown below:
<dependencies> <dependency> <groupId>org.apache.thrift</groupId> <artifactId>libthrift</artifactId> <version>0.9.2</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.7.13</version> </dependency></dependencies>
Step 4:Create QueryImp. java to implement the TestQry. Iface interface. The key code is as follows:
public class QueryImp implements TestQry.Iface{ @Override public QryResult qryTest(int qryCode) throws TException { QryResult result = new QryResult(); if(qryCode==1){ result.code = 1; result.msg = "success"; }else{ result.code = 0; result.msg = "fail"; } return result; }}
Step 5:Create ThriftServerDemo. java to implement the server (this example uses non-blocking I/O, binary Transfer Protocol). The key code is as follows:
public class ThriftServerDemo { private final static int DEFAULT_PORT = 30001; private static TServer server = null; public static void main(String[] args){ try { TNonblockingServerSocket socket = new TNonblockingServerSocket(DEFAULT_PORT); TestQry.Processor processor = new TestQry.Processor(new QueryImp()); TNonblockingServer.Args arg = new TNonblockingServer.Args(socket); arg.protocolFactory(new TBinaryProtocol.Factory()); arg.transportFactory(new TFramedTransport.Factory()); arg.processorFactory(new TProcessorFactory(processor)); server = new TNonblockingServer (arg); server.serve(); } catch (TTransportException e) { e.printStackTrace(); } }}
Step 6:Create thritclientdemo. java to implement the client. The key code is as follows:
public class ThriftClientDemo { private final static int DEFAULT_QRY_CODE = 1; public static void main(String[] args){ try { TTransport tTransport = getTTransport(); TProtocol protocol = new TBinaryProtocol(tTransport); TestQry.Client client = new TestQry.Client(protocol); QryResult result = client.qryTest(DEFAULT_QRY_CODE); System.out.println("code="+result.code+" msg="+result.msg); }catch (Exception e) { e.printStackTrace(); } } private static TTransport getTTransport() throws Exception{ try{ TTransport tTransport = getTTransport("127.0.0.1", 30001, 5000); if(!tTransport.isOpen()){ tTransport.open(); } return tTransport; }catch(Exception e){ e.printStackTrace(); } return null; } private static TTransport getTTransport(String host, int port, int timeout) { final TSocket tSocket = new TSocket(host, port, timeout); final TTransport transport = new TFramedTransport(tSocket); return transport; }}
All the preparations have been completed. Next, we will communicate with the Client and Server. Run ThriftServerDemo to start the Server, and then run thritclientdemo. java to create a Client for calling. When qryCode is set to 1, the result is as follows:
code=1 msg=success
When qryCode = 0, the result is as follows:
code=0 msg=fail
The code structure of the project is attached:
Fig 5.3
Do you think I did not lie to you, is it so easy?
Of course, it is definitely not that simple to use in projects, but the aforementioned chestnuts are enough to guide you in Thrift server and client development.
6. The road is long.
What you have seen so far is not the source code analysis knowledge, and the purpose of this article is not here. To master any technology, you should first understand its macro system and architecture, and then thoroughly study the details and essence. If the so-called source code parsing and other "advanced" things are pursued at the beginning, they will lose the beauty of the whole forest because they have a big tree.
Of course, my next plan is to study Thrift's implementation in depth and hope to share with you for common progress.
References
[1] Apache Thrift-scalable cross-language service development framework
[2] Thrift Java Servers Compared