Apache Avro 1.8.1 Starter Guide (Java) __java

Source: Internet
Author: User
Tags compact serialization

Before we get started, let's take a look at what Apache Avro really is. Can be used to do anything.

Apache Avro is a data serialization system. Serialization is the conversion of objects into binary streams, and the corresponding deserialization is to convert binary streams into corresponding objects. Therefore, Avro is used to convert the object into a binary stream before the data is transmitted, and then the binary stream is Avro to the target address.

Next, let's look at what the official website says.

Apache Avro is a data serialization system.

Avro provides: Rich data structure a compact, fast, binary data format for a container file to store persistent data remote procedure call (RPC) simple dynamic language integration. Code generation does not need to read and write data files, and does not use or implement RPC protocols. Code generation is an optional optimization, and only a static type of language is worth implementing.

As you know, JSON is a lightweight data transmission format, for large datasets, JSON data will show up, because JSON format is key:value type, each record must attach the name of the key, sometimes, the optical key consumes more space than the value of space, This is a very serious waste of space, especially for large data sets, because it is not only not compact enough, but also repeatedly add key information, not only will create a waste of storage space, but also increase the pressure of data transmission, so as to increase the burden on the cluster, and thus affect the overall cluster throughput. The use of Avro data serialization system can be a better solution to this problem, because the Avro serialized file by the schema and the real content, schema is only the metadata of the data, the equivalent of the JSON data key information, the schema is stored in a single JSON file, As a result, the metadata for the data is stored only once, reducing the storage capacity considerably, compared to files in JSON data format. This allows the Avro file to organize data more tightly.

Next, we start using Avro. Download

Take Maven as an example, adding Avro dependencies and Plug-ins, the advantage of plug-ins is that you can automatically generate classes for AVSC files directly.

<dependencies> <dependency> <groupId>org.apache.avro</groupId>
        ;artifactid>avro</artifactid> <version>1.8.1</version> </dependency> <dependency> <groupId>junit</groupId> &LT;ARTIFACTID&GT;JUNIT&LT;/ARTIFACTID&G
            T <version>4.12</version> </dependency> </dependencies> <build> <plugi ns> <plugin> <groupId>org.apache.avro</groupId> <artif Actid>avro-maven-plugin</artifactid> <version>1.8.1</version> <exe
                        Cutions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> < /goals> <configuration> <sourcedirectory>${project.basedir}/src/main/avro/</sourcedirec
                        Tory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
            </configuration> </execution> </executions> 
                </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.6</source> <target>1.6</target> </con figuration> </plugin> </plugins> </build>

It is noteworthy that: the above Pom file is configured with the path to automatically generate classes, that is, ${project.basedir}/src/main/avro/and ${project.basedir}/src/main/java/, so that after configuration, When the MVN command is executed, the plugin automatically generates the class file for the AVSC schema under this directory and puts it in the directory below. Define schema

Use JSON to define a schema for Avro. Schemas consist of basic types (Null,boolean, int, long, float, double, bytes, and string) and complex types (record, enum, array, map, union, and fixed). For example, the following defines a user schema, creates a Avro directory under the main directory, and then new file USER.AVSC in the Avro directory:

{' namespace ': ' Lancoo.ecbdc.pre ', '
 type ': ' Record ',
 ' name ': ' User ',
 ' Fields ': [
     {' name ': ' Name ', ' Type ': ' String '},
     {' name ': ' Favorite_number ',  ' type ': [' int ', ' null ']},
     {' name ': ' Favorite_Color ', ' type ': [' string ', ' null ']}
 ]
}

to serialize and deserialize with code generation Compiling schema

Here, because the Avro plugin is used, the MAVEN plugin automatically generates the class file for us by entering the following command directly:

MVN clean Install

The corresponding classes are then generated in the directory that you just configured, as follows:

If you do not use Plug-ins, you can also use Avro-tools to build:

Java-jar/path/to/avro-tools-1.8.1.jar Compile schema <schema file> <destination>
Create a user

Before that, the class file has been created, and then you can use the class you just generated automatically to create the user:

User User1 = new user ();
User1.setname ("Alyssa");
User1.setfavoritenumber (256);
Leave favorite Color null

//Alternate constructor
User user2 = new User ("Ben", 7, "Red");

Construct via builder
User User3 = User.newbuilder ()
             . SetName ("Charlie")
             . Setfavoritecolor ("Blue")
             . Setfavoritenumber (NULL)
             . Build ();
Serialization of

Serializes and stores the previously created user to a disk file:

Serialize user1, User2 and User3 to disk
datumwriter<user> Userdatumwriter = new Specificdatumwriter<user > (user.class);
datafilewriter<user> datafilewriter = new datafilewriter<user> (userdatumwriter);
Datafilewriter.create (User1.getschema (), New File ("Users.avro"));
Datafilewriter.append (user1);
Datafilewriter.append (user2);
Datafilewriter.append (USER3);
Datafilewriter.close ();

Here we are serializing the user to the file Users.avro deserialization

Next, we deserialize the serialized data:

Deserialize Users from disk
datumreader<user> Userdatumreader = new Specificdatumreader<user> ( User.class);
datafilereader<user> Datafilereader = new datafilereader<user> (New File ("Users.avro"), UserDatumReader);
User user = null;
while (Datafilereader.hasnext ()) {
//Reuse user object by passing it to next (). This is saves us from
//allocating and garbage collecting many to files with
//objects items.
user = Datafilereader.next (user);
SYSTEM.OUT.PRINTLN (user);
}

The entire creation Avro schema, code generation, creating users, serializing user objects, deserializing and final output, complete code can be organized as follows (here I use JUnit):

Import Org.apache.avro.file.DataFileReader;
Import Org.apache.avro.file.DataFileWriter;
Import Org.apache.avro.io.DatumReader;
Import Org.apache.avro.io.DatumWriter;
Import Org.apache.avro.specific.SpecificDatumReader;
Import Org.apache.avro.specific.SpecificDatumWriter;

Import Org.junit.Test;
Import Java.io.File;

Import java.io.IOException;
 /** * Created by Yang on 12/23/16. */public class TestUser {@Test public void Testcreateuserclass () throws IOException {User user1 = new
        User ();
        User1.setname ("Alyssa");
        User1.setfavoritenumber (256);

        Leave favorite Color Null//Alternate constructor User user2 = new User ("Ben", 7, "Red"); Construct via builder User User3 = User.newbuilder (). SetName ("Charlie"). SETFA

        Voritecolor ("Blue"). Setfavoritenumber (NULL). Build (); Serialize user1, User2 and User3 to disk Datumwriter<user> userdatumwriter = new specificdatumwriter<user> (user.class);
        datafilewriter<user> datafilewriter = new datafilewriter<user> (userdatumwriter);
        Datafilewriter.create (User1.getschema (), New File ("Users.avro"));
        Datafilewriter.append (user1);
        Datafilewriter.append (User2);
        Datafilewriter.append (USER3);

        Datafilewriter.close (); Deserialize Users from disk datumreader<user> Userdatumreader = new Specificdatumreader<user> (User.
        Class);
        datafilereader<user> Datafilereader = new datafilereader<user> (New File ("Users.avro"), UserDatumReader);
        User user = null; while (Datafilereader.hasnext ()) {//Reuse user object by passing it to next (). 
            This is saves us from//allocating and garbage collecting many to files with//objects items.
            user = Datafilereader.next (user);
        SYSTEM.OUT.PRINTLN (user);}
    }
} 

After the code executes, you can find that the file Users.avro was created.

The output results are:

{' name ': ' Alyssa ', ' favorite_number ': 256, ' Favorite_Color ': null}
{"Name": "Ben", "Favorite_number": 7, "Favorite_Color": "Red"}
{"Name": "Charlie", "Favorite_number": null, "Favorite_Color": "Blue"}

Okay, is not very simple.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.