In-depth hadoop Research: (14)-Avro Introduction

Source: Internet
Author: User

Reprinted please indicate Source Address:

All source code on GitHub,

Avro is a multi-language data serialization framework that supports C, C ++, C #, Python, Java, PHP, Ruby, and Java.

He was born mainly to make up for the shortcomings that writable only supports the Java language.

Many people will ask thrift and protocol for similar frameworks. Why don't we use these frameworks and re-build them,

Or what are the differences between Avro. First, like other frameworks, Avro uses a language-independent schema to describe data.

The same is Avro'sCodeGeneration is optional, schemaIt is stored together with data, and schema makes the entire data processed

Cheng HeDo not generate code or static data types. To achieve this, we need to assume thatThe Mode for reading data is known.

A tightly coupled code is generated, and you do not need to specify the field ID.

Avro schema isJSONWhile the encoded data is in binary format (of course there are other options,In this way

It is easy to implement in languages that already have a JSON library.

Avro also supports expansion. The written schema and read Schema are not necessarily the same, that is, they are compatible with the new and old Schemas and the New and Old schemas.

For example, if a field is added to the new schema, the old and new clients can read the old data.

Ema writes data. When the old client reads new data, the new fields can be ignored.

Avro also supports datafile and schema is written in the metadata descriptor at the beginning of the file. Avro datafile supports compression and segmentation.

This means that mapreduce input can be made.

I suddenly found that the introduction is really simple enough. For details, add the following chapters.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.