Apache Avro is a data serialization system that is a high performance middleware based on binary data transmission.
1. Provide the following characteristics
A rich data structure
A simple, compact, fast binary data format
A file container for persistent data storage
Remote Procedure Call (RPC)
Simple dynamic language combination, Avro and dynamic language, both read and write data files and use RPC protocol do not need to generate code, and code generation as an optional optimization is only worth in the static type language implementation
2. Comparison with other systems
Avro supports cross-programming language implementations (c, C + +, C#,java, Python, Ruby, PHP), Avro provides features similar to systems such as Thrift and Protocol buffers, but there are some fundamental differences, mainly:
Dynamic type: Avro does not need to generate code, patterns and data are stored together, and patterns make the entire data processing process does not generate code, static data types, and so on. This facilitates the construction of data processing systems and languages.
unlabeled data: Because the pattern is known when the data is being read, the type information that needs to be encoded with the data is very small, so the scale of the serialization is smaller.
User-specified field number is not required: even if the schema changes, the old mode of processing the data is known, so the difference can be resolved by using the field name.
where 3.avro is worth
Avro can be used to serialize data for remote or local high-volume data interactions.
In the process of transmission, Avro saves data storage space and network transmission bandwidth after binary serialization of data.
For example: There is a 100 square house, could have put 100 things, now expect to use some means to allow the original area of the house can be stored more than 150 more or more things, like the data stored in the cache, the cache is fine, you need to fully utilize the cache limited space, storage more data. For example, the network bandwidth resources are limited, hope that the original bandwidth range can be transmitted than the original high volume of data traffic, especially for structured data transmission and storage, which is the significance and value of Avro existence.
4. Getting Started (Java)1) New a MAVEN project
Pom.xml
<dependencies> <dependency> <groupid>org.apache.avro</groupid> <artifactid >avro</artifactid> <version>1.7.7</version > </dependency></dependencies><build><plugins> <plugin> <groupid>org.apache.avro</ Groupid> <artifactid>avro-maven-plugin</artifactid > <version>1.7.7</version> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> <configuration> < sourcedirectory>${project.basedir}/src/main/resources/</sourcedirectory> < outputdirectory>${project.basedir}/src/main/java/</outputdirectory> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> < artifactid>maven-compiler-plugin</artifactid> < configuration> <source>1.7< /source> <target>1.7</ target> </configuration> </ Plugin></plugins>
2) define schema
USER.AVSC file {"namespace": "Org.pq.avro", "type": "Record", "Name": "User", "fields": [{"Name": "Name", "Type": "String "}, {" Name ":" Favorite_number "," type ": [" int "," null "]}, {" Name ":" Favorite_Color "," type ": [" string "," null "]} ]}
3) serializing and deserializing with code generation
executing in the current MAVEN project directory: $ mvn Clean Compile
The resulting User.java class is generated under the Org.pq.arvo directory (note the namespace of the User.avsc file).
Then write the test class Test.java
package org.pq.avro;import org.apache.avro.file.datafilereader;import org.apache.avro.file.datafilewriter;import org.apache.avro.io.datumreader;import org.apache.avro.io.datumwriter;import org.apache.avro.specific.specificdatumreader;import Org.apache.avro.specific.specificdatumwriter;import java.io.file;import java.io.ioexception;public class test { public static void main (String[] args) throws ioexception { //1.creating users user u1 = new user (); u1.setname ("Alyssa"); U1.setfavoritenumber (; user u2 = new ) User ("Ben", 7, "Red"); user u3 = user.newbuilder () .setname ("Charlie") .setfavoritecolor ("Blue") .setfavoritenumber (NULL) .build (); //2.now let ' s serialize our users to disk DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User> ( User.class); datafilewriter<user> Datafilewriter = new datafilewriter<user> (Userdatumwriter); file file =&nBsp; new file ("Users.avro"); datafilewriter.create ( U1.getschema (), file); datafilewriter.append (U1); datafilewriter.append (U2); Datafilewriter.append (U3); datafilewriter.close (); //3.Deserialize Users from dist datumreader<user> userdatumreader = new specificdatumreader <User> (User.class); datafilereader<user> Datafilereader = new datafilereader<user> (File, userdatumreader); User user = null; while (Datafilereader.Hasnext ()) { // reuse user object by passing it to next () . this saves us from // allocating and garbage collecting many objects for files with // many items. user = datafilereader.next (user); system.out.println (user); } }}
Run Result:
{"Name": "Alyssa", "Favorite_number": "Favorite_Color": "Null}" {"Name": "Ben", "Favorite_number": 7, "Favorite_Color": "Red"} {"Name": "Charlie", "Favorite_number": null, "Favorite_Color": "Blue"} |
4) Serializing and deserializing without code generation
package org.pq.avro;import org.apache.avro.schema;import org.apache.avro.file.datafilereader; import org.apache.avro.file.datafilewriter;import org.apache.avro.generic.genericdata;import org.apache.avro.generic.genericdatumreader;import org.apache.avro.generic.genericdatumwriter;import org.apache.avro.generic.genericrecord;import org.apache.avro.io.datumreader;import org.apache.avro.io.datumwriter;import java.io.file;import java.io.ioexception;import Java.net.urisyntaxexception;public class test2 { public static void main (String[] args) throws IOException, URISyntaxException { //First, we use a Parser to read our schema definition and create a schema object. File file = New file (Test2.class.getClassLoader (). GetResource ("USER.AVSC"). Touri ()); schema schema = new schema.parser (). Parse (file); //using this schema,let ' s create some users genericrecord u1 = new genericdata.record (Schema); u1.put ("name", "Alyssa"); u1.put ("Favorite_number", &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;GENERICRECORD&NBSP;U2); = new genericdata.record (schema); u2.put ("name") , "Ben"); u2.put ("Favorite_number", 7); u2.put ("Favorite_Color", "Red"); // serialize u1 anD u2 to disk file usersfile = new file ("Users.avro"); datumwriter<genericrecord> datumWriter = new GenericDatumWriter<GenericRecord> (Schema); DataFileWriter<GenericRecord> dataFileWriter = new Datafilewriter<genericrecord> (Datumwriter); Datafilewriter.create (schema, file); datafilewriter.append (U1); datafilewriter.append (U2); datafilewriter.close (); // deserialize users from disk datumreader<genericrecord > datumreader = new genericdatumreader<genericrecord> (schema); datafilereader <GenericRecord> dataFileReader = new DataFileReader<GenericRecord> (file, Datumreader); genericrecord user = null; while (Datafilereader.hasnext ()) { // Reuse user object by passing It to next () . this saves us from // allocating and garbage collecting many objects For files with // many items. user = Datafilereader.next (useR); system.out.println (user); } }}
Run Result:
{"Name": "Alyssa", "Favorite_number": "Favorite_Color": "Null}" {"Name": "Ben", "Favorite_number": 7, "Favorite_Color": "Red"} |
Reference:
Https://avro.apache.org/docs/current/gettingstartedjava.html
Http://www.javabloger.com/article/hadoop-avro-rpc-java.html
Apache Avro 1