Reprinted please indicate Source Address: http://blog.csdn.net/lastsweetop/article/details/9664233
All source code on GitHub, https://github.com/lastsweetop/styhadoop
Schema defines schema in JSON format, including the following three forms: 1. JSON string type, mainly native type
2. JSON array, mainly Union
3. JSON object, format:{"type": "typeName" ...attributes...}
Including native and Union types. attributes can include Avro-defined attributes that do not affect data serialization.
Eight Native types: NULL, Boolean, Int, long, float, double, bytes, and strings.1. native types do not need attributes2. you can specify "string" and {"type" Through Type ": "string"} is equivalent to 3. the implementations of different languages are different. For example, the double type is double in C, C ++ and Java, float in Python and float in ruby. A total of six composite types are records, enums, arrays, maps, unions, and fixedrecordsrecords. They are generally the final display units of serialized data and can be nested by themselves.{ "type": "record", "name": "LongList", "aliases": ["LinkedLongs"], "fields" : [ {"name": "value", "type": "long"}, {"name": "next", "type": ["LongList", "null"]} ]}
Enums enumeration, which is easy to understand{ "type": "enum", "name": "Suit", "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]}
Arrays array.{"type": "array", "items": "string"}
Mapsmap, keys must be string, so only the values type is specified here.{"type": "map", "values": "long"}
Unions cannot contain two or more identical types without the name attribute ["string", "null"]
Fixedsize specifies the number of bytes each value occupies.{"type": "fixed", "size": 16, "name": "md5"}
Three mappinggeneric ing types may have different mapping types for one language, but all languages must support dynamic mapping. Before processing, we do not know that schemaspecific ingjava and C ++ can both generate source code in advance, apireflect ing with more domain-oriented than generic mapping converts Avro type to Java type using reflection, but this mapping is slower than the first two, so it is not used