MongoDB (i)

Source: Internet
Author: User
Tags value store

I. The introduction of NoSQL

NoSQL, which refers to non-relational databases. With the rise of internet web2.0 website, the traditional relational database in coping with web2.0 website, especially the web2.0 pure dynamic website of ultra-large-scale and high-concurrency SNS type, has been unable to overcome, exposing a lot of difficult problems, and the non-relational database has been developed very rapidly because of its own characteristics. NoSQL databases are created to address the challenges of multiple data types in large-scale data sets, especially big data application challenges. NoSQL website: http://www.nosql-database.org

1. Four categories of NoSQL databases

    • key Value (Key-value) stores the database: This type of database primarily uses a hash table that has a specific key and a pointer to the specific data. The advantage of the Key/value model for IT systems is simplicity and ease of deployment.  But if the DBA only queries or updates part of the value, Key/value becomes inefficient. Examples include: Tokyo cabinet/tyrant, Redis, Voldemort, Oracle BDB.

    • column store database:

    • graph database: The graph structure of the database is different from the other columns and the rigid structure of the SQL database, it is using a flexible graphical model, and can be extended to multiple servers. NoSQL databases do not have a standard query language (SQL), so database queries require a data model. Many NoSQL databases have rest-type data interfaces or query APIs. such as: neo4j, Infogrid, Infinite graph. 

Therefore, we summarize the NoSQL database in the following cases, the comparison is applicable: 1, the data model is relatively simple, 2, the need for more flexible IT systems, 3, the database performance requirements are high, 4, does not require a high degree of data consistency, 5, for a given key, more easily map complex values of the environment.

2. Analysis of four classified forms of NoSQL database

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/74/1E/wKioL1YVKYDTcjXHAALjdtnURmM540.jpg "title=" 2.png " alt= "Wkiol1yvkydtcjxhaaljdtnurmm540.jpg"/>

3. Common features

There is no clear scope and definition for NoSQL, but they all have common features:

  • No predefined schemas are required: You do not need to define the data schema beforehand and predefine the table structure. Each record in the data may have different properties and formatting. When inserting data, it is not necessary to pre-define their patterns.

  • No shared schema: a fully shared schema in the storage area network relative to all data storage. NoSQL often divides the data and stores it on each local server. Because the performance of reading data from a local disk tends to be better than the performance of reading data over a network, it improves the performance of the system.

  • Elastic Extensibility: You can dynamically add or delete nodes while the system is running. No maintenance is required and data can be migrated automatically.

  • Partitioning: Rather than storing data at the same node, a NoSQL database needs to partition the data and spread the records across multiple nodes. It is usually partitioned and replicated at the same time. This improves both parallel performance and guarantees that there is no single point of failure.

  • Asynchronous replication: Unlike a RAID storage system, replication in NoSQL is often a log-based asynchronous replication. In this way, the data can be written to a node as soon as possible without being delayed by the network transmission. The disadvantage is that consistency is not always guaranteed, and a small amount of data may be lost in the event of a failure.

  • Base: The NoSQL database guarantees the base attribute relative to the transaction-strict acid characteristics. Base is the final consistency and soft transaction.

NoSQL databases do not have a unified architecture, the difference between two NoSQL databases, or even far more than two kinds of relational databases. It can be said that NoSQL has its merits, and that successful NoSQL must be particularly useful in certain situations or applications where it is far better than relational databases and other NoSQL.

4. Cap theory

The theory was made by famous American scientists and founder Eric Brewer, a well-known internet company Inktomi, at the 2000 PODC (Symposium on Principles of Distributed Computing) Conference, Later, Seth Gilbert and Nancy Lynch both proved the correctness of the CAP theory, although in the next decade many people have raised a lot of objections to the CAP theory, but in the NoSQL world, it is very useful. It means that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only meet up to two at a time.

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/74/1E/wKioL1YVKbSBLn_-AABvrrtGzTc142.jpg "title=" 3.png " alt= "Wkiol1yvkbsbln_-aabvrrtgztc142.jpg"/>

C:consistency consistency

A:availability availability

P:partition Tolerance Partitioning fault tolerance

The core of the CAP theory is that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only satisfy two at the same time.

    • C:consistency consistency

For consistency, it can be divided into two different perspectives from the client and server side. From the client side, consistency mainly refers to the problem of how the updated data gets when multiple concurrent accesses are being accessed. From the server side, it is how updates replicate across the system to ensure that the data is ultimately consistent. Consistency is due to the problem of concurrent read and write, so in understanding the consistency of the problem, it is important to consider the combination of concurrent read and write scenarios.

From the client's perspective, when multi-process concurrent access, the updated data in different processes how to obtain different policies, determine the different consistency. For relational databases, it is strong consistency to require that the updated data be visible to subsequent accesses. If you can tolerate any subsequent partial or full access, it is weak consistency. If the updated data is required after a period of time, it is final consistency.

    • A:availability availability

For an availability distributed system, each non-faulted node must respond to each request. That is, any algorithm used by the system must eventually terminate. When partitioning tolerance is required, this is a strong definition: even for serious network errors, each request must be terminated.

Good usability mainly refers to the system can be very good for the user Service, there is no user operation failure or access timeout, such as bad user experience. Availability and distributed data redundancy, load balancing, etc. are often associated with usability.

    • P:partition Tolerance Partitioning fault tolerance

Partitioning is closely related to fault tolerance and extensibility. In distributed applications, the system may not function properly due to some distributed causes. Good partitioning of fault tolerance requires that the application be a distributed system, but it seems to be in a functioning whole. For example, the current distributed system has one or several machines have been down, the rest of the machine can be run to meet the needs of the system, or the machine has network anomalies, the distributed system is separated from several parts, the various parts can also maintain the operation of the distributed system, so that has good partition fault tolerance.

Systems that meet consistency, availability, are often less scalable: traditional relational databases.

Consistent, partition-tolerant systems, usually less stable in response to user actions: Redis, MongoDB, hbase, etc.

Systems that meet availability, partitioning tolerance, often may have less consistency requirements: use on DNS.

5. Comparison of acid and base

The transactions of traditional relational database systems have acid properties, i.e. atomicity (atomicity), consistency (consistency), isolation (isolation, also known as independence), persistence (durability).

    • Atomicity: All operations in the entire transaction, either complete or complete, are not likely to stall in the middle of the process. When an error occurs during execution, the transaction is rolled back (Rollback) to the state before the transaction begins, as if the transaction had never been executed.

    • Consistency: The integrity constraints of the database are not compromised until the transaction begins and after the transaction has ended.

    • Isolation: Two transactions do not interfere with each other, and a transaction cannot see the data in the middle of a time when other transactions are running. No interaction occurs with two transactions.

    • Persistence: After the transaction is completed, changes made to the database by the firm are persisted in the database and are not rolled back.

Due to the existence of the CAP theory, a variant base of acid has emerged in order to improve performance:

    • Basic availability: Basically available

    • Soft-state: Soft state/flexible transaction, can be understood as "no connection", and "hard" is "connection-oriented"

    • eventual consistency: final consistency, eventually the entire system (time and system requirements related) See the data is consistent.

Interestingly, acid is meant to be sour, and base is the mean of the base, so this is an opposing thing. In fact, in essence, acidity (acid) emphasizes consistency (c in the CAP), while alkali (base) emphasizes availability (a in cap).

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/74/21/wKiom1YVKbeDABLGAAKcYiPMU40310.jpg "title=" 4.png " alt= "Wkiom1yvkbedablgaakcyipmu40310.jpg"/>

6. Consistency model

    • Weak consistency (Weak): When you write a new value, the read operation is not guaranteed to read the latest value on each copy of the data. For example: Some cache systems, online game other players data and you do not have any relationship.

    • Final consistency (eventually): Eventually is a special case of Weak. When you write a new value, you may not be able to read it, but after a certain time window it is guaranteed to eventually read it. For example: DNS, e-mail, Amazon S3,google search engine such as the system.

    • Strong consistency (strong): Once the new data is written, the new value can be read at any moment in any copy. For example: File system, Rdbms,azure table are strong consistency.

From these three consistent models, we can see that Weak and eventually are generally asynchronous redundant, while strong is generally synchronous redundancy, and asynchronous usually means better performance, but it also means more complex state control. Synchronization means simplicity, but it also means performance degradation.

7, the realization of data consistency technology

①quorum System NWR Policy:

NWR is a strategy for controlling conformance levels in distributed storage systems and is applied to Amazon Dynamo. The NWR model gives the user the option of the CAP, whichever two of the cap is selected by the user. where n represents n backups, W represents at least a W to be considered successful, and r represents at least read R to be considered successful.

    • If w+r>n, it can guarantee strong consistency. Because W+r > N, so R > n-w, what does that mean? The number of copies to be read must be greater than the number of copies that were not successfully written, so that at least a new value can be read.

    • If W+r<=n, the final consistency can be guaranteed.

    • If we want a highly writable environment, we can configure W=1 r=n. At this point it is considered successful to write any node successfully, but the data must be read from all nodes at the time of reading.

    • If we ask for efficient reading, we can configure W=n r=1. This time any node reading success is considered successful, but when writing must write all three nodes success is considered successful.

② Two-phase commit:

English-Phase Commit, also called 2PC. Two-phase commits are often used for distributed transactions and are strong consistency algorithms. The brief is divided into two stages:

    • In the first stage, the master node (Coordinator) asks if all nodes (participants) can submit the action, and the participant responds yes or No.

    • In the second phase, the Facilitator, based on the response received, sends an "official submit" command to all participants if all participants respond yes. After the participants have finished resuming the "complete" message, the coordinator collects the responses of each node and ends the Global Transaction. Send a rollback operation to all participants if one is denied. After the participants have successfully rolled back and responded "rollback complete," the coordinator collects the "rollback" response from each node and cancels the Global Transaction.

2PC plainly is the first stage to do Vote, the second stage to make a decision of an algorithm, relative to the library transaction is before the submission of a more prepared stage. But there are also problems, one of which is the synchronous blocking operation, which is bound to greatly affect performance. Another major problem is the timeout. So there's a 3PC, which is mainly to divide the submission process into two steps, and more on Wikipedia.

③paxos algorithm

④ Time Stamp Policy

⑤ Vector Clock


Second, MongoDB introduction

1. Introduction

    • is an open source, non-relational database based on distributed, document-oriented storage. Non-relational databases are the most versatile and most like relational databases.  

    • can run on Windows, UNIX, OSX, On Solaris systems, 32-bit and 64-bit applications are supported, and drivers for multiple programming languages are available.  

    • The data structures supported are very loose, is a JSON-like Bson format that stores data in the form of key-value pairs that can store complex data types.  

    • supported data types are: null, Boolean, String, ObjectId, 32-bit integer, 64-bit integer, 64-bit floating-point number, date, regular expression, JS code, binary data, array, inline document, maximum value, minimum value, undefined type.  

Among them, the embedded document I understand is not. Doc.txt and other files, the document referred to here is a storage unit of MongoDB (equivalent to the record in relational data), in MongoDB in the form of {key1:value1,key2:value2}, The embedded document is the form {key1:value1,key2:{key2.1:value2.1,key2.2:value2.2}}.

The biggest feature of MongoDB is that the query language he supports is very powerful, its syntax is a bit like object-oriented query language, almost can realize the most functions like relational database single table query, but also support the index of data.

2. Features of MongoDB

  • For collection storage. Data is grouped into collections, each of which can contain an unlimited number of documents, which can be thought of as an RDBMS table, except that the collection does not require schema definitions.

  • Mode of freedom. There is no concept of rows and columns in the collection, and each document can have different key,key values that do not require a consistent data type.

  • Supports dynamic queries. MongoDB supports rich query expressions, and query directives use JSON-form expressions.

  • Full index support. The query optimizer of MongoDB parses the query expression and generates an efficient query plan.

  • Efficient data storage, supporting binary data and large objects (pictures, videos, etc.).

  • Supports replication and recovery.

  • Automatic sharding to support cloud-level scalability, support for horizontal db clusters, and the ability to dynamically add additional servers.

  • Supports spatial indexes.

  • Query performance profiling.

3, the application of MongoDB scenario

    • Website data: MONGO is ideal for real-time inserts, updates and queries, as well as the replication and high scalability required for real-time data storage on the site.

    • Caching: Because of its high performance, MONGO is also suitable as a caching layer for the information infrastructure. After the system restarts, the persistent cache layer built by MONGO can avoid overloading the underlying data sources.

    • Large, low-value data: Storing some data in a traditional relational database can be expensive, and many times programmers often choose traditional files for storage.

    • Highly scalable scenario: The MONGO is ideal for databases made up of dozens of or hundreds of servers. Built-in support for the MapReduce engine is already included in the roadmap for MONGO.

    • Storage for objects and JSON data: The MONGO Bson data format is ideal for storing and querying in a document format.

4, MongoDB does not apply to the scene

    • Requires a highly transactional system. For example, a relational database is required for applications such as banks or accountants that require a lot of atomic complexity.

    • Traditional Business intelligence applications

    • Complex table-cascading queries

5. Comparison with related databases

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/74/1E/wKioL1YVKenRCU68AAJBJ_zWq7Y940.jpg "title=" 5.png " alt= "Wkiol1yvkenrcu68aajbj_zwq7y940.jpg"/>

This article is from the "Bread" blog, make sure to keep this source http://cuchadanfan.blog.51cto.com/9940284/1700711

MongoDB (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.