Reprinted please indicate Source Address: http://blog.csdn.net/lastsweetop/article/details/9448961
All source code on GitHub, https://github.com/lastsweetop/styhadoop
Avro is a multi-language data serialization framework that supports C, C ++, C #, Python, Java, PHP, Ruby, and Java.
He was born mainly to make up for the shortcomings that writable only supports the Java language.
Many people will ask thrift and protocol for similar frameworks. Why don't we use these frameworks and re-build them,
Or what are the differences between Avro. First, like other frameworks, Avro uses a language-independent schema to describe data.
The same is Avro'sCodeGeneration is optional, schemaIt is stored together with data, and schema makes the entire data processed
Cheng HeDo not generate code or static data types. To achieve this, we need to assume thatThe Mode for reading data is known.
A tightly coupled code is generated, and you do not need to specify the field ID.
Avro schema isJSONWhile the encoded data is in binary format (of course there are other options,In this way
It is easy to implement in languages that already have a JSON library.
Avro also supports expansion. The written schema and read Schema are not necessarily the same, that is, they are compatible with the new and old Schemas and the New and Old schemas.
For example, if a field is added to the new schema, the old and new clients can read the old data.
Ema writes data. When the old client reads new data, the new fields can be ignored.
Avro also supports datafile and schema is written in the metadata descriptor at the beginning of the file. Avro datafile supports compression and segmentation.
This means that mapreduce input can be made.
I suddenly found that the introduction is really simple enough. For details, add the following chapters.