MongoDB is a product between a relational database and a non-relational database, and is the most versatile and most like relational database in a non-relational database. The data structure he supports is very loose and is a JSON-like Bjson format, so you can store more complex data types. MONGO's biggest feature is that the query language he supports is very powerful, and its syntax is a bit like an object-oriented query language that almost implements most of the functionality of a relational database single-table query, and also supports indexing of data.
for most MongoDB users, MongoDB is like a big black box. But if you can understand some of the internal structure of MongoDB, it will help you better understanding and using MongoDB.
BSON
in MongoDB, a document is an abstraction of the data that is used in interaction between the client side and the server side. All clients (driver in various languages) use this abstraction, and its expression is what we often call Bson (Binary JSON).
BSON is a lightweight binary data format. MongoDB is able to use Bson and store Bson as data storage on disk.
When the client is going to write to the document, using a query, and so on, the document needs to be encoded in Bson format and then sent to the server side. Similarly, the return result of the server side is encoded in the Bson format and returned to the client side.
use the Bson format for the following 3 purposes:
efficiency. Bson is designed for efficiency and requires little space. Even in the worst case scenario, the Bson format is much more efficient than the JSON format in the best case scenario.
Transport. In some cases, Bson will sacrifice additional space to make data transfer more convenient. For example, the transmission prefix of a string identifies the length of the string, not the end tag at the end of the string. This form of transmission facilitates mongodb to modify the transmitted data.
performance. Finally, the encoding and decoding of the Bson format is very fast. It uses a C-style data representation that can be used efficiently in a variety of languages.
Write Protocol
the client side accesses the server side using a lightweight TCP/IP write protocol. This protocol is described in detail in the MongoDB wiki, which actually makes a simple wrapper over the Bson data. For example, the command that writes the data contains a 20-byte message header (consisting of the length of the message and the Write command identifier), the collection name to write, and the data to write.
Data Files
In the MongoDB Data folder (the default path is/data/db), all the files that make up the database. Each database contains an. ns file and some data files, where data files become more numerous as the amount of data increases. So if there is a database named Foo, then the file that makes up Foo is composed of foo.ns,foo.0,foo.1,foo.2 and so on.
each new data file will be twice times the size of the previous data file, with a maximum of 2G per data file. This kind of design is helpful to prevent the database with small amount of data to waste too much space, at the same time can ensure the database with large data volume has corresponding space use.
MongoDB uses a pre-allocation approach to ensure stable write performance (this can be turned off using –noprealloc). The pre-allocation is performed in the background, and each pre-allocated file is populated with 0. This allows MongoDB to always keep extra space and free data files, thus avoiding the congestion caused by the allocation of disk space due to the rapid growth of data.
namespaces and disk areas
Each database consists of multiple namespaces, each of which stores the corresponding type of data. Each collection in a database has its own namespace, and the index file also has a namespace. The metadata for all namespaces is stored in the. ns file.
The data in the namespace is divided into multiple intervals in the disk, called the disk area. In, foo This database contains 3 data files, and the third data file belongs to an empty pre-allocated file. The first two data files are divided into corresponding extents corresponding to different namespaces.
shows the relevant characteristics of the namespace and the area. Each namespace can contain a number of different extents, which are not contiguous. As the data files grow, each namespace corresponds to the size of the extents that are increasing as the number of allocations increases. This is done to balance the wasted space in the namespace with the continuity of the data in a given namespace. There is also a namespace to be aware of: $freelist, this namespace is used to record the extents that are no longer used (collection or indexes that are deleted). Whenever a namespace needs to allocate new extents, it will first see if the $freelist has the right size to use.
memory-mapped storage engine
MongoDB currently supports the memory mapping engine for the storage engine. When MongoDB starts, all data files are mapped to memory, and the operating system hosts all disk operations. This storage engine has the following characteristics:
The code for memory management in MongoDB is very streamlined, after all the relevant work has been managed by the operating system.
the virtual memory used by the MongoDB server will be very large and will exceed the size of the entire data file. Don't worry, the OS will handle all this. It is important to note that MongoDB itself does not manage memory, can not specify memory size, completely to the operating system to manage, so sometimes is not controllable, in the production environment must be used at the OS level to monitor memory usage.
MongoDB cannot control the order in which data is written to disk, which will cause MongoDB to fail to implement the Writeahead log feature. So, if MongoDB wants to provide a durability feature, another storage engine needs to be implemented.
32-bit system MongoDB server each Mongod instance can use only 2G of data files. This is because the address pointer can only support 32-bit.
features
It is characterized by high performance, easy to deploy, easy to use, and easy to store data. The main features are:
for collection storage, easy to store data for object types.
mode of freedom.
supports dynamic queries.
supports full indexes, including internal objects.
support Queries.
supports replication and recovery.
use efficient binary data storage, including large objects such as video.
automatically process fragmentation to support scalability at the cloud level
supports multiple languages such as ruby,python,java,c++,php.
file storage format is Bson (an extension of JSON)
accessible over the network
the so-called "set-oriented" (collenction-orented), meaning that data is grouped in a dataset, is called a collection (collenction). Each collection has a unique identifying name in the database and can contain an unlimited number of documents. The concept of a collection is similar to a table in a relational database (RDBMS), unlike it does not need to define any schema (schema).
mode Freedom (schema-free) means that for files stored in a MongoDB database, we do not need to know any of its structure definitions. If necessary, you can store files of different structures in the same database.
The documents stored in the collection are stored in the form of key-value pairs. The key is used to uniquely identify a document as a string type, whereas a value can be a complex file type in each. We call this storage form Bson (Binary serialized dOcument Format).
other
There are only so many internal MongoDB constructs introduced in MongoDB the definitive guide, and if you really want to make it clear, you may need another book to tell. For example, the internal JS parsing, query optimization, indexing and so on. Interested friends can directly refer to the source code:)
Basic characteristics and internal structure of #转帖 #mongodb