For most MongoDB users, MongoDB is like a big black box, but if you can understand some internal structures of MongoDB, it will help you better understand and use MongoDB.
BSON
In MongoDB, a document is an abstraction of data. It is used in the interaction between the Client and the Server. All clients (drivers in various languages) use this abstraction, which is represented in BSON (Binary JSON ).
BSON is a lightweight binary data format. MongoDB can use BSON and store BSON as data in disks.
When the Client needs to write and use query operations, the file must be encoded in BSON format and then sent to the Server. Similarly, the return result of the Server is also encoded in BSON format and then returned to the Client.
The BSON format is used for the following purposes:
1) Efficiency
BSON is designed for efficiency and only requires a small amount of space. Even in the worst case, the BSON format is more efficient than the JSON format in the best case.
2) Transmission
In some cases, BSON sacrifices extra space to facilitate data transmission. For example, the prefix transmitted by a string identifies the length of the string, rather than marking the end at the end of the string. This transmission mode facilitates MongoDB to modify the transmitted data.
3) Performance
Finally, BSON encoding and decoding are very fast. It uses a C-style data representation, which can be used efficiently in various languages.
Write Protocol
The Client accesses the Server using a lightweight TCP/IP write protocol. This Protocol is described in detail in MongoDB Wiki. It is actually a simple package on BSON data. For example, a Data Writing command contains a 20-byte message header (consisting of the message length and the write command ID), the Collection name to be written, and the data to be written.
Data Files
In the data folder of MongoDB (the default path is/data/db), all the files that constitute the database. Each database contains a. ns file and some data files. The data files will increase as the data volume increases. So if there is a database named foo, the file that makes up the foo database will be composed of foo. ns, foo.0, foo.1, foo.2, and so on.
Each time a data file is added, it is twice the size of the previous data file, and each data file is up to 2 GB. This design helps prevent databases with a small amount of data from wasting too much space, while ensuring that the databases with a large amount of data have the corresponding space to use.
MongoDB uses the pre-allocation method to ensure stable Write Performance (this method can be disabled using-noprealloc ). Pre-distribution is performed in the background, and each pre-allocated file is filled with 0. This will allow MongoDB to maintain extra space and spare data files, thus avoiding the blocking caused by disk space allocation due to excessive data growth.
Namespace and disk Zone
Each database is composed of multiple namespaces, and each namespace stores the corresponding types of data. Each Collection in the database has its own namespace, and the index file also has a namespace. Metadata of all namespaces is stored in the. ns file.
The data in the namespace is divided into multiple intervals in the disk, which is called the disk area. In, the database foo contains three data files, and the third data file is an empty pre-allocated file. The first two data files are divided into different namespaces for the corresponding disk areas.
Displays the characteristics of the namespace and disk area. Each namespace can contain multiple different disk areas, which are not consecutive. As data files grow, the size of the Disk Area corresponding to each namespace increases with the number of times allocated. This aims to balance the space wasted by the namespace and maintain data continuity in a namespace. There is also a namespace to note: $ freelist, which is used to record the disk areas that are no longer in use (the deleted Collection or index ). Whenever the namespace needs to be allocated to a new disk area, you will first check whether $ freelist has a suitable Disk Area.
Memory ing storage engine
MongoDB currently supports the memory ing engine. When MongoDB is started, all data files are mapped to the memory, and the operating system hosts all disk operations. This storage engine has the following features:
* MongoDB's memory management code is very streamlined. After all, the related work has been managed by the operating system.
* The MongoDB server uses a huge amount of virtual memory and exceeds the size of the entire data file. Don't worry, the operating system will handle all this. It should be noted that MongoDB does not manage the memory itself and cannot specify the memory size, which is managed by the operating system. Therefore, it is sometimes uncontrollable, memory usage must be monitored at the OS level in the production environment.
* MongoDB cannot control the order in which data is written to the disk. As a result, MongoDB cannot implement the writeahead log feature. So if MongoDB wants to provide a durability feature (this feature can be referred to my article on Cassandra: http://www.cnblogs.com/gpcuster/tag/Cassandra/), it needs to implement another storage engine.
* A 32-bit MongoDB server can only use 2 GB Data files for each Mongod instance. This is because the address pointer can only support 32 bits.