Avro Entry 1-serialization and Remote Communication

Source: Internet
Author: User
Avro is a sub-project in Hadoop and an independent project in Apache. Avro is a high-performance middleware based on binary data transmission. This tool is also used in other Hadoop projects, such as HBase (Ref) and Hive (Ref) Client and server data transmission. Avro can serialize data, applicable to remote or local large

Avro is a sub-project in Hadoop and an independent project in Apache. Avro is a high-performance middleware based on binary data transmission. This tool is also used in other Hadoop projects, such as HBase (Ref) and Hive (Ref) Client and server data transmission. Avro can serialize data, applicable to remote or local large

Avro is a sub-project in Hadoop and an independent project in Apache. Avro is a high-performance middleware based on binary data transmission. This tool is also used in other Hadoop projects, such as HBase (Ref) and Hive (Ref) Client and server data transmission. Avro can serialize data, it is applicable to remote or local mass data interaction.
Avro saves data storage space and network transmission bandwidth after binary serialization of data during transmission.For example:There is a 100-square-meter house that could have put 100 things. Now we hope to use some means to make the original area House store more than 150 things or more, it is like storing data in the cache. The cache is very expensive. You need to make full use of the limited space in the cache to store more data.For exampleNetwork bandwidth resources are limited. We hope that the original bandwidth range can transmit data traffic that is larger than the original one, especially for structured data transmission and storage. This is the significance and value of Avro.

Avro can also support multiple languages in the same system, which is similar to another Apache product: Thrift (Ref). Different from Thrift, Avro is more flexible, avro supports dynamic loading of the defined Schema for system expansion.

Avro can be used in the following way:
1. binary encoding. Avro-specific relies on code (Files) to generate specific classes and embed JSON Schema;
2. JSON encoding. The Avro-generic method dynamically loads the Schema through the JSON file. You can process the new data source without compiling and loading.
In my superficial opinion, the difference between the two lies in the same data size. Avro data produced in binary encoding is 100 bytes in size, JSON encoding produces 450 bytes. Although it seems that 1st binary encoding methods have some advantages, the biggest problem in binary transmission is that it is inconvenient to trace bugs, JSON encoding is more useful for data communication between systems.

My nonsense:
1. From XML to JSON, and then from JSON to Avro/Google PBs, the technology continues to develop and the time span in the middle is getting shorter and shorter.
2. I personally think Avro is a better way to process structured data (JSON/XML) than compressed (Gzip/z7.
3. You can also use Avro to serialize, store, or transmit communications to the following products:
A) HBase, Hive, MySQL
B) Redis and MemCached
C) local file storage and Solr remote calls
D) MapReduce Distributed Computing
References: https://github.com/spullara/havrobase
4. Don't use Avro's Socket as a gun. It's hard to handle it. It is clearly stated in the ref of this api.
References: http://avro.apache.org/docs/1.5.0/api/java/org/apache/avro/ipc/SocketServer.html
5. If specific mode needs to be compiled with a avro-tools.jar package, and generic method directly calls the JSON file.

Avro supports both local and remote RPC (Ref) calls. RPC Remote calls can be divided into Http and netty2. here we mainly introduce Http-based Avro remote calls, first, you must define a JSON file as the Transport Protocol Specification for communication between the two parties to facilitate parsing the data sent from the other party.
In this protocol, there are three parts:
1. Description (Protocol Declaration), namespace and Protocol name.
2. Data types (Types): encapsulate a set of data formats based on the Primitive and Complex types data Types in the specification.
3. messages: defines a) request, B) response, and c) exception (optional) data format based on the data type defined by the user.
For more information, see the user. avpr file in the following code example.

A message sent from a client to a server must go through the Transport Layer to send messages and receive responses from the server. The data that arrives at the transport layer is binary data. Generally, HTTP is used as the transmission model, and client data is sent to the server in POST mode. A message is encapsulated into a Buffer. Avro specifies a standard serialized format, that is, whether it is file storage or network transmission, the data Schema appears before the data. The data itself does not contain any Metadata (Tag). When the file is stored, the schema appears in the file header. Schema appears in the initial handshake phase during network transmission. The client and server need to maintain a visible protocol cache. Therefore, after a handshake is completed, during network switching, you do not need to transmit all the text of the Protocol.
This is where I capture packets while the program is running:

Illustration:
192.168.1.2 is the server, 192.168.1.106 is the client, we can see that the size of the data packet returned by the server during the first transmission is 891, and the size of the database package returned for the second and third transmission is 77. And the returned content is obviously serialized (Bottom left), That is, the response data assembled by the server.

Avro messages are divided into multiple frames to form a buffer list. Frame splitting is a layer in the message and transmission layers. It can optimize some operations. The format of the message is as follows:
* A series of caches, each of which includes:
O A 4-byte high-end cache Length
O buffer data
* The Message ends with a zero-length cache.
Frame splitting is transparent for request and Response Message formats. Any message may be divided into one or more caches.

Let's talk about this code example. I divided it into three packages and one user. avpr file:
1. The client is directly placed on the outermost side. ClientHandler is used to assemble the data requested based on the protocol and obtain the returned results.
2. server, using the built-in Jetty as the server, AvroFactory determines the message types of different requests. I have defined two types of search and update, that is, in the same user. different protocol content can be transmitted through processing and judgment in the avpr protocol.
After AvroHandler is assembled and processed by the business logic, it returns results. I created some data in the updateRespond/searchRespond2 methods to simulate the data retrieval results.
3. tools, protocol parsing tool (AvroUtils. java ).
4. user. avpr transfer protocol file.
After the code is imported into the IDE environment, run AvroServer. java to start the server, and then run ClientHandler. java to execute the client call.

Download sample code:
Http://javabloger-mini-books.googlecode.com/files/avro-http-json.rar

The above code example uses Avro version 1.4.1. You can download the jar file that Avro depends on from here:
Http://javabloger-mini-books.googlecode.com/files/avro-lib-1.4.rar

By reading some of Avro's source code and implementing it according to the design ideas, you can also integrate Avro into Tomcat, GlassFish, Resin, and other servers, you don't have to use Jetty that comes with Avro as the server. You just need to implement a Servlet on your own. In this way, the embedded Jetty server in Avro is not restricted and some technical features can be used,For example: Persistent connections, long polling, Servlet listeners and filters, and more in-depth Performance Optimization for Web containers.
Here is a simple code example. You can refer to it. After downloading it, You can directly deploy the war Project to tomcat and other web containers. The Client code is also in the war Project (Client. java ).

Download sample code:
Http://javabloger-mini-books.googlecode.com/files/avroweb.war

Now let's talk about MapReduce in Avro and the internal structure in Avro.

-End-

Original article address: Avro Entry 1-serialization and remote communication. Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.