Apache Avro 1

Source: Internet
Author: User


Apache Avro is a data serialization system that is a high performance middleware based on binary data transmission.

1. Provide the following characteristics
    • A rich data structure

    • A simple, compact, fast binary data format

    • A file container for persistent data storage

    • Remote Procedure Call (RPC)

    • Simple dynamic language combination, Avro and dynamic language, both read and write data files and use RPC protocol do not need to generate code, and code generation as an optional optimization is only worth in the static type language implementation

2. Comparison with other systems  

Avro supports cross-programming language implementations (c, C + +, C#,java, Python, Ruby, PHP), Avro provides features similar to systems such as Thrift and Protocol buffers, but there are some fundamental differences, mainly:

    • Dynamic type: Avro does not need to generate code, patterns and data are stored together, and patterns make the entire data processing process does not generate code, static data types, and so on. This facilitates the construction of data processing systems and languages.

    • unlabeled data: Because the pattern is known when the data is being read, the type information that needs to be encoded with the data is very small, so the scale of the serialization is smaller.

    • User-specified field number is not required: even if the schema changes, the old mode of processing the data is known, so the difference can be resolved by using the field name.

where 3.avro is worth   
    • Avro can be used to serialize data for remote or local high-volume data interactions.

    • In the process of transmission, Avro saves data storage space and network transmission bandwidth after binary serialization of data.

    • For example: There is a 100 square house, could have put 100 things, now expect to use some means to allow the original area of the house can be stored more than 150 more or more things, like the data stored in the cache, the cache is fine, you need to fully utilize the cache limited space, storage more data. For example, the network bandwidth resources are limited, hope that the original bandwidth range can be transmitted than the original high volume of data traffic, especially for structured data transmission and storage, which is the significance and value of Avro existence.

4. Getting Started (Java)1) New a MAVEN project

Pom.xml

<dependencies>    <dependency>         <groupid>org.apache.avro</groupid>        <artifactid >avro</artifactid>        <version>1.7.7</version >    </dependency></dependencies><build><plugins>     <plugin>        <groupid>org.apache.avro</ Groupid>        <artifactid>avro-maven-plugin</artifactid >        <version>1.7.7</version>         <executions>             <execution>                 <phase>generate-sources</phase>                 <goals>                     <goal>schema</goal>                 </goals>                 <configuration>                     < sourcedirectory>${project.basedir}/src/main/resources/</sourcedirectory>                     < outputdirectory>${project.basedir}/src/main/java/</outputdirectory>                 </configuration>             </execution>        </executions>     </plugin>    <plugin>         <groupId>org.apache.maven.plugins</groupId>        < artifactid>maven-compiler-plugin</artifactid>        < configuration>            <source>1.7< /source>            <target>1.7</ target>        </configuration>    </ Plugin></plugins>


2) define schema    


USER.AVSC file {"namespace": "Org.pq.avro", "type": "Record", "Name": "User", "fields": [{"Name": "Name", "Type": "String "}, {" Name ":" Favorite_number "," type ": [" int "," null "]}, {" Name ":" Favorite_Color "," type ": [" string "," null "]} ]}


3) serializing and deserializing with code generation

executing in the current MAVEN project directory: $ mvn Clean Compile

The resulting User.java class is generated under the Org.pq.arvo directory (note the namespace of the User.avsc file).

Then write the test class Test.java

package org.pq.avro;import org.apache.avro.file.datafilereader;import  org.apache.avro.file.datafilewriter;import org.apache.avro.io.datumreader;import  org.apache.avro.io.datumwriter;import org.apache.avro.specific.specificdatumreader;import  Org.apache.avro.specific.specificdatumwriter;import java.io.file;import java.io.ioexception;public  class test {    public static void main (String[] args)  throws ioexception {        //1.creating users         user u1 = new user ();         u1.setname ("Alyssa");         U1.setfavoritenumber (;        user u2 = new ) User ("Ben", 7, "Red");         user u3  = user.newbuilder ()                  .setname ("Charlie")                  .setfavoritecolor ("Blue")                  .setfavoritenumber (NULL)                  .build ();         //2.now  let ' s serialize our users to disk         DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User> ( User.class);        datafilewriter<user>   Datafilewriter = new datafilewriter<user> (Userdatumwriter);         file file =&nBsp; new file ("Users.avro");         datafilewriter.create ( U1.getschema (), file);         datafilewriter.append (U1);         datafilewriter.append (U2);         Datafilewriter.append (U3);         datafilewriter.close ();         //3.Deserialize Users from dist         datumreader<user> userdatumreader = new specificdatumreader <User> (User.class);        datafilereader<user>  Datafilereader = new datafilereader<user> (File, userdatumreader);         User user = null;         while  (Datafilereader.Hasnext ())  {        // reuse user object by  passing it to next () . this saves us from         // allocating and garbage collecting many objects  for files with        // many items.             user = datafilereader.next (user);             system.out.println (user);         }    }}

Run Result:

{"Name": "Alyssa", "Favorite_number": "Favorite_Color": "Null}"

{"Name": "Ben", "Favorite_number": 7, "Favorite_Color": "Red"}

{"Name": "Charlie", "Favorite_number": null, "Favorite_Color": "Blue"}

4) Serializing and deserializing without code generation

package org.pq.avro;import org.apache.avro.schema;import org.apache.avro.file.datafilereader; import org.apache.avro.file.datafilewriter;import org.apache.avro.generic.genericdata;import  org.apache.avro.generic.genericdatumreader;import org.apache.avro.generic.genericdatumwriter;import  org.apache.avro.generic.genericrecord;import org.apache.avro.io.datumreader;import  org.apache.avro.io.datumwriter;import java.io.file;import java.io.ioexception;import  Java.net.urisyntaxexception;public class test2 {    public static  void main (String[] args)  throws IOException, URISyntaxException {         //First, we use a Parser to read  our schema definition and create a schema object.         File file = New file (Test2.class.getClassLoader (). GetResource ("USER.AVSC"). Touri ());         schema schema = new schema.parser (). Parse (file);         //using this schema,let ' s create some users         genericrecord u1 = new genericdata.record (Schema);         u1.put ("name", "Alyssa");         u1.put ("Favorite_number", &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;GENERICRECORD&NBSP;U2);  = new genericdata.record (schema);         u2.put ("name") , "Ben");         u2.put ("Favorite_number", 7);         u2.put ("Favorite_Color", "Red");         //  serialize u1 anD u2 to disk        file usersfile = new  file ("Users.avro");        datumwriter<genericrecord>  datumWriter = new GenericDatumWriter<GenericRecord> (Schema);         DataFileWriter<GenericRecord> dataFileWriter = new  Datafilewriter<genericrecord> (Datumwriter);         Datafilewriter.create (schema, file);         datafilewriter.append (U1);         datafilewriter.append (U2);         datafilewriter.close ();         // deserialize  users from disk        datumreader<genericrecord > datumreader = new  genericdatumreader<genericrecord> (schema);         datafilereader <GenericRecord> dataFileReader = new DataFileReader<GenericRecord> (file,  Datumreader);        genericrecord user = null;         while  (Datafilereader.hasnext ())  {             // Reuse user object by passing  It to next () . this saves us from             // allocating and garbage collecting many objects  For files with            // many  items.            user =  Datafilereader.next (useR);             system.out.println (user);         }    }}


Run Result:

{"Name": "Alyssa", "Favorite_number": "Favorite_Color": "Null}"

{"Name": "Ben", "Favorite_number": 7, "Favorite_Color": "Red"}



Reference:

Https://avro.apache.org/docs/current/gettingstartedjava.html

Http://www.javabloger.com/article/hadoop-avro-rpc-java.html

Apache Avro 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.