Use BSon to convert data records to binary

Source: Internet
Author: User
By convention, I will first introduce what is BSon. BSon is short for BinaryJSON and is a binary storage format of a type of Json. Currently, Bson is mainly used by MongoDB (an open-source database with a popular non-Relational Data Model) to store data and exchange network data. Next we will go to the topic to introduce BSon's

By convention, I will first introduce what is BSon. BSon is short for Binary JSON and is a type of Json Binary storage format. Currently, Bson is mainly used by MongoDB (an open-source database with a popular non-Relational Data Model) to store data and exchange network data. Next we will go to the topic to introduce BSon's

By convention, I will first introduce what is BSon.

BSon is short for Binary JSON and is a type of Json Binary storage format. Currently, Bson is mainly used by MongoDB (an open-source database with a popular non-Relational Data Model) to store data and exchange network data.

Next we will go to the topic to introduce how BSon converts MongoDB documents into binary files for storage. Before that, readers need to get a rough idea of the rules on the BSon official website, the link is provided in the following references.

As for the introduction form, it is certainly an example to explain each conversion step for everyone.

Before introducing the conversion steps, you need to understand four basic types of BSon.

The following is an example.

Example 1:

1 {2 "Name": "DataResearchLab", 3 "IsGreat": true ,}

This example is used to store data in non-relational databases. The explanation on BSon is equivalent to a Document. If you want to use a relational database for explanation, this is equivalent to a record, but this record has its own column name. Continue to explain the above record. The outermost "{}" is equivalent to "{}" in the language, indicating a range, that is to say, the data here needs to be stored as a whole. ":" The left side can be understood as a key value, that is, a database has only one unique key value. By the way, all the key values of BSon are considered as cstring (one of the basic types of BSong storage ). ":" The right side is the value corresponding to the left side key. This value can be in many forms, such as Boolean, String, 32-bit integer, floating point, or even array, JavaScript code, regular Expressions Or nesting a Document, etc. For details about the Hong Kong server, refer to the introduction on the BSon official website.

Let's look back at this example. According to the previous explanation, the user wants to store a data item in the database, which includes three data items. The key value "Name" corresponds to the stored data of the string type "DataResearchLab", and the key value "IsGreat" corresponds to the stored data of the Boolean Type "true", which is a Hong Kong VM, the key value "Feilds" corresponds to an array-type data storage "[" CloudComputing "," NoSQL "," BigData "]".

Next, let's go to the topic and see how BSon converts the data to binary format and stores it in Non-relational databases. The data is used as a Document and BSon rule (the specific rule is not provided here. Please refer to the Rule Description on the BSon official website) and need to be further divided into three int32, e_list, "\ x00 ".

First, we will introduce int32. Int32 occupies four bytes, but this requires special attention. The first point is that the int32 interpreted by Document is used to calculate the length of the Document, but this length contains the four bytes of its own int32 length. In this example, a total of 102 bytes are required for the Document. The 102 bytes include the four bytes occupied by int32 and the length of all the required bytes After decoding, of course, 102 bytes cannot be calculated at the beginning. This data can be calculated only after the Document is fully interpreted. The second point should be noted that the storage length itself occupies four bytes, and these four bytes need to be stored at a high level. As for what is high-level storage, here is only a simple explanation, that is, the byte in the four bytes corresponds to the highest bit in the memory, and the highest byte corresponds to the memory's second bit. In this example, 102 bytes are required. The hexadecimal format is "\ x66 \ x00 \ x00 \ x00 ", instead of "\ x00 \ x00 \ x00 \ x66" (This storage form is low-level storage ).

Next let's talk about "\ x00 ". The corresponding hexadecimal format is equivalent to an Terminator, which is a bit similar to the "\ 0" at the end of the string type in the programming language ". The end of this part.

The key section e_list is described below. E_list is only an intermediate form in the BSon rule, that is, e_list needs further explanation. The e_list is further interpreted as two parts. The first part is element, and the second part is e_list or "". First, let's talk about the second part. In this case, the e_list explained in the first step needs to be interpreted as e_list. Why? In this example, only "Name" is stored, and "IsGreat" and "Feilds" need to be explained. In other words, the e_list step requires recursion, continue to explain cyclically. So what is post e_list interpreted? When interpreting the last "Fields" and there are no data items to be stored, You need to interpret the second part "", note that "" does not occupy storage space, that is, it does not have a length. It can be understood as "" does not exist in the final form of BSon's interpretation.

Return to the first part of the element. It should be noted that the element is also an intermediate form that needs to be further interpreted. The element is also the most abundant explanation in BSon, that is, it can be interpreted as many forms. To explain the form, you need to select the data storage type. Note that the type of data stored here does not refer to the key value, because we have already said that BSon only interprets all the key values as cstring, the storage data type here refers to the data type that exactly corresponds to the key value. Here "Name" stores the type "UTF-8 string" in the BSon basic storage type ". The following "IsGreat" and "Fields" correspond to "Boolean" true "" and "Array" in the BSon basic storage type respectively ".

Continue with the explanation in the "Name" section. The previous section has said that it corresponds to the "UTF-8 string" in the basic storage type ". The element of the BSon rule must be interpreted as "\ x02", e_name, and string. The first part corresponds to the final storage format, which represents the selected type, that is, "\ x02 ". In the second part, e_name must be further uniquely interpreted as cstring, that is, the type of the key value. The cstring must also be uniquely interpreted as (byte *) and "\ x00 ". (Byte *) indicates that the key value itself is stored, that is, "Name" (this does not include double quotation marks ""). The "\ x00" is the same as the previous one, indicating the end of this part of interpretation. String must be uniquely interpreted as int32, (byte *), and "\ x00. Let's take a look at these three parts. First, explain the second and third parts. The meanings of these two parts are similar to those of the previous two parts, that is, (byte *) indicates that "DataResearchLab" is stored (key value is used before, and data value is used here ). "\ X00" indicates the end of this part. The first part also has an int32 type, but it must be noted that it is different from the int32 in Document. The int32 interpreted in the Document uses bold fonts to explain two considerations: the Hong Kong virtual host. here we need to explain that the int32 and Document interpreted by stirng are the same and different places. The int32 interpreted by string is also stored at a high level, but the difference is that the binary length does not include four bytes of int32. This int32 stores "\ x10 \ x00 \ x00 \ x00", which includes "DataResearchLab" 15 bytes and end symbol "\ x00.

In fact, the storage of the "Name" key value has been described, but the storage of the entire example is far from over.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.