ZooKeeper source code analysis-Jute-Part 1, zookeeper-jute-
Hadoop record I/O contains class files and record description language interpreters to simplify the serialization and deserialization of records.
Introduction
Any software system with significant complexity needs to exchange data with the outside world. Data Interaction usually involves packets and packages (such as files, network connections, and memory buffers) of the logical units of input and output data .). An application is usually nested with operation code for serialization and deserialization of data types. Serialization has several features that can automate code generation. A special output encoding format (such as binary and XML) is provided. serialization of basic data types and combinations of basic data types is a mechanical task. Writing serialized code manually may cause bugs, especially when records has many fields or a record that is defined differently between different versions. Finally, it is useful for Data Interaction between applications written in different programming languages. It is easier to describe the data records operated by applications in a language-independent manner and use the description to generate implementations of different target languages. This document describes Hadoop Record I/O. One mechanism is:
1) provides simple serialization data type specifications
2) provides code generation in different target languages of the preceding encapsulation and package types.
3) provide target-specific pre-development support so that application programmers can integrate the generated code into the application.
Hadoop Record I/O targets some types of XDR, ASN.1, PADS and ICE mechanisms. Although these systems all contain a standard DDL file of most record types, they are quite different in other aspects. Hadoop Record I/O focuses on data serialization and multi-language support. We can serialize data using a translator. Hadoop users must use a simple data description language to describe their data. The Hadoop DDL translator rcc generates code. You can call a simple read/write data stream abstraction to read and write data. Next, we will list some objectives and non-objectives of Hadoop Record I/O.
Objectives:
1) supports common basic types. Hadoop should include common built-in types that we want to support.
2) supports composite types (including recursive composite ). Hadoop should support composite types such as structs or vectors.
3) code generation in different target languages. Hadoop should be able to generate serialized code in different target languages and be well scalable. The initial objectives are C ++ and JAVA.
4) Support for the target language. Hadoop should have built-in support for the target header files, libraries, or packages so that they can be well built into applications.
5) Multiple Output encoding formats are supported. It can be encapsulated binary, comma-separated text, or XML.
6) supports backward or forward compatible record types.
Non-objective:
1) serialize arbitrary C ++ files.
2) serialize complex data structures such as trees and linked list.
3) built-in indexes, compression, or checksum.
4) dynamically construct entities generated from XML.
Subsequent documents mainly describe the features of Hadoop record I/O in details. Part 1 describes the data types supported by the system, Part 2 describes the DDL syntax of the simple record example, and part 3 describes the process of using rcc code generation, part 2 describes the ing of the target language and the support for the Hadoop type. We already have a relatively complete description of the C ++ ing, the upcoming document updates will include Java and other languages. The last part describes the output encoding support.