I. Preface
This article mainly describes the data type BSON used by MongoDB, the transmission Protocol Mongo Wire Protocol used, and the internal structure of the MongoDB data file.
Ii. BSON
BSON[Bee · sahn], short for Binary JSON, Is a binary-encoded serialization of JSON-like documents.
I believe everyone is familiar with JSON. on the official website, there is a page with hundreds of characters to clarify the JSON rules. JSON only contains six data types: null, object, Boolean, and number, string and array. BSON supports more data types, such as date types.
An example {"hello": "world"} on the bson website analyzes the storage structure of BSON as follows:
TotalSize (4) | {BSONType (1) | FieldName | Data} * EOO (1)
TotalSize: the total length after the Document is converted to BSON, which is 4 bytes
BSONType: Data type, which occupies one byte
FieldName: field name, in the example of hello, key/value of the "key", UTF-8 string, string has a terminator '\ 0'
Data: The value of key/value. In this example, "world" is used. If it is a string, you need to add four bytes before Data to store the Data length. For other formats, see the official website.
*: Logarithm of key/value, for example, {"hello": "world", "hello1": "world1"}, which has two pairs
EOO: Terminator, \ x00
Length of the analysis example {"hello": "world:
TotalSize (4) + BSONType (1) + FieldName (5 + 1) + Data (4 + 5 + 1) + EOO (1) = 22 bytes
Go to Mongo Shell and view the bson size through Object. bsonsize.
In addition.
BSON sacrifices some space in exchange for a format that is easier to traverse. It does not need to check whether it is equal to '\ 0' every time. The traversal is a very important attribute of BSON.
Iii. Mongo Wire Protocol
Both Mongo Wire Protocol and http and ftp belong to the application layer Protocol, but this Protocol is currently only used in MongoDB-related applications. Each Mongo Wire Protocol message consists of a standard message header and specific request data.
The Standard Message Header Format is as follows:
Specific requests, such as modifying a set
For more information, see http://www.mongodb.org/display/docs/?+wire=protocol.
Iv. Internal Structure of MongoDB data files
Each database has one. ns file and several data files (. 0 ,. 1 ,. 2 ,.....), where. the ns file is 16 MB, while. 0 file 16 M ,. 1 file 32 M, double in the future, the maximum value is 2 GB, so that small databases do not waste too much space, large databases can use continuous disk space.
Each set and index in the Database correspond to the namespace.
This is the namespace in the local database. You can see the set, fixed set (capped collection), and indexes all have their own namespace.
The. ns file records several set namespaces and index namespaces.
A set namespace has multiple data domains (extent). The set namespace stores the set metadata, such as the set name, the first data domain of the set, and the location of the last data domain. A data field consists of several documents. Each data field has a header that records the knowledge of the first and last documents and some metadata of the data field. Extent and document are connected through a two-way linked list.
The index storage data structure is Tree B, and the index namespace stores pointers to the root node of Tree B. The internal structure of MongoDB data is shown below (picture from NoSQLFan)