Basic concept basic features 1 rich data type 2 easy to expand 3 rich features 4 no sacrifice speed 5 Easy management scenario storage model concurrency control memory management disaster recovery 1 Backup 2 Repair MongoDB Transport Protocol MongoDB file internal structure MongoDB data synchronization Fragmentation mechanism
1 Basic concepts, basic features
MongoDB is a powerful, flexible, scalable way to store data. It expands the many useful features of relational databases, such as secondary indexes, range queries (range query), and sorting. MongoDB is rich in functionality, such as built-in support for MapReduce aggregation, and support for geo-spatial indexing.
The MongoDB data model is very friendly to developers, and configuration options are easy for administrators, as well as natural language APIs provided by drivers and database shells. MongoDB will remove obstacles for you to focus on programming itself rather than worrying about storing data.
MongoDB only supports atomic modification of single-line records and does not support atomic operations on multiple rows of data.
1.1 Rich data types
MongoDB is a document-oriented database, not a relational database. The main reason for abandoning a relational model is to make it easier to scale and, of course, other benefits.
The basic idea is to replace the original "row" concept with a more flexible "document" model. Document-oriented methods can embed documents or arrays, so a single record can represent very complex hierarchical relationships.
MongoDB does not have a pattern: The document's keys are not defined in advance and will not be fixed. Because there is no pattern to change, there is usually no need to migrate large amounts of data. You don't have to put all the data in a mold, and the application layer can handle new or missing keys.
1.2 easy to expand
MongoDB from the initial design to consider the expansion of the problem. It uses a document-oriented data model that allows it to automatically split data across multiple servers. It can also balance the data and load of the cluster and automatically rearrange the documents. If you need more capacity, just add new machines to the cluster and let the database handle the rest.
1.3 Rich features
(1) Index
MongoDB supports common auxiliary indexes, enables multiple quick queries, and provides unique, composite, and geo-spatial indexing capabilities.
(2) storing JavaScript
Developers do not have to use stored procedures, and can access JavaScript functions and values directly on the server side.
(3) Polymerization
MongoDB supports MapReduce and other aggregation tools
(4) Fixed set
The size of the collection is capped, which is especially useful for certain types and data, such as logs.
(5) file storage
MongoDB supports the use of an Easy-to-use protocol to store metadata for large files and files.
Some common functional MongoDB of relational databases are not available, such as joins (join) and complex multiline transactions. This architectural consideration is designed to improve scalability because these two features are hard to implement on a distributed system.
1.4 Not sacrificing speed
MongoDB uses the MONGODB transport protocol as the primary way to interact with the server (the corresponding protocol requires more overhead, such as http/rest). It dynamically fills the document, allocates data files, and uses space for stable performance. Memory-mapped files are used in the default storage engine, and memory management work is handed over to the operating system for processing. The dynamic query optimizer "remembers" the most efficient way to execute a query.
1.5 easy to manage
MongoDB as much as possible to allow server autonomy to simplify the management of the database. In addition to starting the database server, there is no necessary administrative action for the outcry. If the primary server is hung up, MongoDB automatically switches to the backup server and promotes the backup server to a Active server. In a distributed environment, a cluster will automatically integrate and configure new nodes only if it knows of new nodes.
2 Application Scenarios
The main goal of MongoDB is to set up a bridge between key/value storage (high performance and high scalability) and traditional RDBMS systems (rich functionality), combining the advantages of both. MONGO applies to the following scenarios:
(1) Website data: MONGO is very suitable for real-time inserts, updates and inquiries, and has the Web site real-time data storage needs of replication and high scalability.
(2) Caching: Because of its high performance, MONGO is also suitable as a caching layer for the information infrastructure. After the system restarts, the persistent cache built by MONGO can avoid overloading the underlying data source.
(3) Large size, low value data: the use of traditional relational database to store some data may be more expensive, before this, many programmers often choose traditional files for storage.
(4) Highly scalable scenario: MONGO is ideal for databases made up of dozens of or hundreds of servers.
(5) Storage for objects and JSON data: MONGO's Bson data format is ideal for document-formatted storage and querying.
Not suitable for the scene:
(1) A highly-functional system: such as a bank or accounting system. Traditional relational databases are now more suitable for applications that require a large number of atomic complex transactions.
(2) Traditional business intelligence applications: The BI database for a particular problem can be highly optimized for queries. For this type of application, the data warehouse may be a more appropriate choice.
(3) Need SQL problem.
3 Storage Model
The MongoDB document is an abstract concept. The specific rendering form depends on the driver and programming language used. Because communication in MongoDB relies heavily on documents, it requires a document expression that is shared by all drivers, tools, and processes, which is called Binary JSON (Bson).
Bson is a lightweight binary format that can represent all documents in MongoDB as byte strings. Databases can understand Bson, and there are documents on disk that are also in this format.
When the driver wants to insert a document, or if the document is a query condition, the driver converts the document into a Bson and then sends it to the server. Similarly, the document returned to the client is also a Bson-formatted string. The driver needs to decode these data into the native document representation and finally return it to the client.
Three main goals in the Bson format.
(1) Efficiency
Bson is designed to represent data more effectively, taking up less space. In the worst-case scenario, Bson is slightly less efficient than JSON, and Bson is much more efficient in the best case, such as storing binary data or large numbers.
(2) Availability of convenience
Sometimes Bson sacrifice space efficiency in exchange for more easily traversed formats. For example, precede a string with its length, rather than using a terminator at the end. This is useful for mongodb introspective documents.
(3) Performance
The final Bson encoding and decoding speed are very fast. It represents types in C-style representations, and is very fast in most programming languages.
4 concurrency control
MongoDB does not support transactions, there is no database-level pessimistic lock. Then our concurrency control can only rely on optimistic locks.
(1) Pessimistic lock: it is assumed that concurrent conflicts will occur, shielding all operations that may violate data integrity.
(2) Optimistic lock: Assuming no concurrent conflicts occur, only check for violations of data integrity when submitting operations, optimistic locks do not resolve dirty read and read-only problems.
5 Memory Management
MongoDB The default storage engine is the memory-mapped engine. When the server is started, all data files are mapped to memory. The operating system is then responsible for writing the buffered data to disk and transferring the data into the paging memory page. Such an engine has several important features.
(1) The code that MongoDB manages memory is very refined because most of the work is pushed to the operating system.
(2) The virtual size of the MongoDB server process is usually very large, exceeding the size of the entire dataset.
(3) MongoDB cannot control the order in which data is written to disk, that is, the persistence of a single machine cannot be provided with a pre write log.
(4) A 32-bit MONGODB server has a limit that can handle up to 2GB of data per Mongod. This is because all data must be accessible with a 32-bit address.
6 disaster-tolerant backup
6.1 Backup
(1) Mongodump and Mongorestore
Mongodump is a method that can be backed up at run time by querying the running MongoDB and then writing all the documents that are found to disk. Mongodump uses a common query mechanism, so the resulting backup is not necessarily a real-time snapshot of server data, but also adversely affects the performance of other clients. In addition to Mongodump,mongodb also provides a tool for recovering data from backups Mongorestore.
(2) Fsync and locks
While using Mongodump and mongorestore to keep backups, it loses the ability to get real-time data views. MongoDB's Fsync command can replicate data directories while the MongoDB is running without damaging the data. The Fsync command forces the server to write all buffers to disk. You can also choose to lock and block further writes to the database until the lock is released.
(3) Dependent backup
Although several of the above methods are already flexible in backing up data, they are less likely to be backed up from the server, because they do not care about the performance of the subordinate server or the ability to read or write, so you can choose the 3 backup methods above: shutdown, dump and restore tools or Fsync commands.
6.2 Repair
The MongoDB built-in repair feature tries to repair corrupted data files. The easiest way to fix all the databases is to add-repair:mongod–repair to start the server. The actual process of fixing the database is actually very simple: export all the documents and import them immediately, ignoring the invalid documents. When finished, the index is reset.
7 MongoDB Transmission Protocol
The driver, based on the TCP/IP protocol, simply encapsulates the MongoDB transport protocol, which is used to interact with the MongoDB. is basically a simple package of bson data. For example, the Insert message has 20 bytes of header data (including the code that tells the server to perform the insert operation, and the message length), the name of the collection to insert, and the list of Bson documents to insert.
8 MongoDB File internal structure
MongoDB on the data store by namespace, a collection is a namespace, and an index is a namespace.
Data in the same namespace is divided into a number of extent,extent using a two-way linked list connection.
In each extent, the data for each row is saved, which is also connected through a two-way link.
Each row of data storage includes not only the data footprint, but also some additional space, which makes it possible to move the location without moving the data when the update becomes larger.
The index is implemented in btree structure.
9 MongoDB Data Synchronization
MongoDB synchronous process using replica sets mode
This process can be briefly described as follows:
(1) The red arrow indicates that the write operation can be written to the primary. Then asynchronously synchronizes to multiple secondary.
(2) A blue arrow indicates that the read operation can be read from either primary or secondary.
(3) The heartbeat synchronous detection is maintained between each primary and secondary, which is used to judge the status of replica sets.
10 Partitioning mechanism
The
MongoDB fragment is the designation of a piecewise key, the data is divided into different chunk by range, and the size of each chunk is limited. The
has multiple fragment nodes to hold these chunk, and each node holds a portion of the chunk.
Each fragment node is a replica Sets, which guarantees the security of the data.
When a chunk exceeds the maximum size of its limit, it splits into two small chunk.
when chunk is unevenly distributed in a fragmented node, a chunk migration operation is raised. The
is preceded by a fragmentation mechanism, which is the role of several nodes at the time of fragmentation
Client Access routing node MONGOs to read and write data. The
Config server holds two mapping relationships, one is the interval of the key value corresponds to which chunk mapping, and the other is the mapping relationship between the chunk nodes.
The routing node obtains the data information through the config server, and through this information, finds the fragment node that actually holds the data to carry on the corresponding operation. The
routing node also determines whether the current chunk exceeds the qualified size in the write operation. If it is exceeded, it is grouped into two chunk.
for queries and update operations by fragment key, the routing node will find specific chunk and then do the work.
for queries and update operations that do not follow the fragment key, MONGOs sends a request to all subordinate nodes and then merges the returned results.