Background:
For the convenience of developers in large-scale distributed Java applications, the underlying RPC framework usually encapsulates some calls, developers at the application layer can only write simple pojo objects when developing services, such as spring remoting and JBoss remoting.
As business needs arise, upper-layer applications may want to adopt non-Java technologies, such as PHP and Ruby on Rails. Due to restrictions of Java GC and memory models, some underlying services may require higher performance and more flexible technologies, such as C ++ and python.
At this time, we will consider the cross-language issue. The problem of how to make the system implement cross-language without modifying the RPC framework implemented by the original pojo is placed on the middleware developers.
Problem:
Now we can extract the problems mentioned above:
1) The publishing method of the original Java RPC service cannot be changed, and pojo is still used.
2) Non-Java applications on the upper layer can call services published in the form of pojo on the server side.
3) underlying non-Java applications, such as C ++ and python, can publish services in the same format as pojo service.
4) provide an elegant excuse for application developers.
Industry study:
fortunately, we are not the first person to encounter this problem. Let's take a look at the valuable wealth that our predecessors in the industry have left us (mainly in the Internet industry ).
Google protocol buffers : Google is always a step ahead, at the early stage of Google's architecture, we realized the importance of cross-language. During the construction of bigtable and GFS, we developed a set of cross-language solutions. That is Google protocol buffers. However, Google protocl buffers was only open-source in 08 years. What we see is that Google protocl buffers is actually a zombie version, without the support of map (according to some data, Google has this internally), Python native C performance is optimized, not including RPC service, although it is supplemented later, however, the availability is not satisfactory. You cannot add multiple parameters or throw exceptions. But in this regard, we really should not report too much hope, because Google has already said protocol buffers-a language-neutral, platform-neutral, extensible way of serializing structured data, okay, it is just a serialization format, but unlike Hessian and Java serialization, protocol buffers can use proto (IDL) to define the data structure) Code , which greatly reduces the development workload. Unfortunately, the generated code is highly invasive, it cannot generate the required pojo Java object.
But even so, we have learned a lot from Google protocol buffers.
- Encoding compression uses base 128 varints to serialize numbers to reduce network transmission overhead.
- For non-self-describing data, protocol buffers embeds the description information of each data structure into the code. Therefore, you only need to transmit the data to deserialize the data structure instance.
- Immutable object, protocol buffers adopts the builder & message mode in the generated Java code. Message is an unchangeable object, that is, only getter, no setter, every message is generated by a corresponding builder. From this point, we can see that Google has used functional programming.
- Although the RPC method of protocol buffers is simple, only asynchronous callback calling is provided at the beginning. It can be seen that Google has implemented asynchronous calls. If people in the Internet industry know, this is rather difficult.
Facebook Thrift:In, haha, that's right. Thrift was open-source by Facebook in November, a little Google-like. Thrift is Facebook's own cross-language implementation. Someone will ask what is the difference between this and Protocol buffers. OK. Let's take a look at its definition.
Thrift is a software framework for scalable cross-language services development. it combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C ++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C #, cocoa, smalltalk, and ocaml.
It is clearly a cross-language service development framework. The functions include code generation (Code Generation and Protocol buffers), cross-language (cross-language, protocol buffers), and service development (okay, this Protocol buffers also has ). As a result, it seems like Google protocol buffers and Google protocol buffers are completely in the same field, and it is a bit repetitive in inventing the wheel.
At the beginning, we also had such a doubt. Well, let's look at it. Here we go. In fact, thrift is quite different from Protocol buffers in addition to these commonalities (they all solve cross-language problems. The differences are as follows:
1) provide a complete service stack and define a complete set of RPC service framework stacks. This protocol buffers is not available. This is definitely a thrift tool. If you want to develop a service, thrift even has the implementation of a stack layer.
2) OK, there is such a sentence in thrift's paper. Thrift enforces a certain messaging structure when transporting data, but it is agnostic to the Protocol encoding in use. well, I understand. It doesn't matter which serialization method you use, Hessian, XML, or even protocol buffers. Oh, my God.
3) Next, I had to worship thrift's service interface's powerful, multi-parameter, exception, synchronization, and asynchronous call support. This is exactly what we wanted, and we instantly compared it to Protocol buffers.
4) map and set are supported for multiple sets, which will make you feel at ease. Protocol buffers trembling.
At this time, our dear readers will ask, isn't our problem solved, thrift. I laughed and said nothing. Although thrift is so powerful, it is still not what we want, and the code generated by Thrift is also highly invasive, so that the pojo objects cannot publish services. Another hard injury is that although thrift's stack was very powerful, it was definitely incompatible with the stack of our original system. For example, JBoss remoting and spring remoting all added header information, thrift has implemented the transfer of Chinese without header information. It is worth mentioning that the existing thrift service implementation is not thread-safe. Considering that some languages do not support threads well, especially the most commonly used PHP language on Facebook, therefore, there is no thread-safe client implementation in the existing implementation. In this way, the client connection cannot be reused, which is equivalent to a short connection. (PS: Is short connection really inferior to long connection? This is a problem .)
To sum up what I learned from Facebook thrift:
1) synchronization and Asynchronization are supported. This is very powerful. The general practice is to develop servers with high performance requirements in asynchronous mode and call clients with high usability requirements in synchronous mode, is perfect.
2) according to the existing non-thread security implementation, Facebook is likely to have a set of more efficient thread security implementation. It is estimated that it is not related to thrift or the core technology, therefore, it is not difficult to do it by yourself.
3) Thrift has optimized the performance of native C for many scripting languages, such as Python. Native C improves the performance by 20 times. Protocol buffers has been doing this optimization and intends to add it to 2.4. However, protocol buffers is as hard-performing as JDK 7, not long ago, it was revealed in the forum that the buddy who made this optimization had left Google and was no longer responsible. Well, I am concerned about where he went, and he ran away with tears.
Apache hadoop Avro:Avro is a data serialization system. Avro provides functionality similar to systems such as thrift, protocol buffers, etc. Well, all of them admit it, so we don't have to worry about it.
A Brief Introduction: AVO is an architecture used to transmit data under the hadoop project. It is also a cross-language solution. However, Avro has its own highlights. 1,Dynamic typing, 2,Untagged data, 3 ,.No manually-assigned field IDS.
A bright spot, dynamic typing, oh, my God. Yes, Avro puts metadata in a schema object, and then serializes the corresponding pojo to redeem it. This is exactly what I want. As for other features, I did not take Avro into account. I feel that it is more difficult to learn than thrift and Protocol buffers. If you are familiar with it, you can give me some tips.
Solution:
Now, you may know what we want and what we don't want from Protocol buffers, thrift, and Avro. To solve our problem, we only need to develop strengths and circumvent weaknesses. It's our stuff. The solution is as follows:
1) Message serialization format and code generation using protocol buffers.
2) use the service generation format of thrift and implement the thrift (JBoss remoting) stack compatible with JBoss remoting or spring remoting.
3) The original pojo object is serialized and deserialized using Avro schema.
Okay. Everything looks so perfect. Well, don't be confused. There are still a lot of detail things to solve. It's not too early to eat a bowl of instant noodles, wash and sleep, and share the details with you if you have time.