1. Summary
Nosql databases are a challenge for traditional SQL databases. Due to the data size expansion of enterprises and Internet applications, SQL cannot support distributed storage and high-speed reading and writing of such massive data, so nosql came into being. Nosql improves database performance through a simple and efficient data storage method such as key-value.
2. Theory
Cap, base, and eventual consistency are three cornerstones of the existence of nosql databases. The three theories are described in detail below.
2.1cap Theory
C: consistency (synchronization of read/write data changes for multiple users)
A: Availability availability (Quick data acquisition)
P: tolerance of network partition fault tolerance (Distributed reliability)
The CAP theory was proposed by Professor Eric Brewer. The core of the CAP theory is that a distributed system cannot meet the consistency, availability and partition Fault Tolerance requirements, and can only meet two requirements at most.
See: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
2.2base Theory
Basicallyavailble basic availability (failed to support partition)
Soft-State Soft State/flexible transaction (stateless connection, supporting asynchronous)
Eventual consistency final consistency (do not require high consistency, only require eventual consistency)
The core of the base theory is to sacrifice high consistency to obtain availability or reliability.
See: http://www.jdon.com/jivejdon/thread/37625
2.3 eventual consistency Theory
(1) Strong Consistency
Strong Consistency (instant consistency) if a first writes a value to the storage system, the storage system ensures that the latest values will be returned for subsequent read operations A, B, and C.
(2) Weak Consistency
If a writes a value to the storage system, the storage system cannot guarantee that the read operations of A, B, and C can read the latest value. In this case, there is a concept of "inconsistent window", which refers to the period from when a writes a value to when A, B, and C reads the latest value.
(3) final consistency
Eventual consistency is a special case of weak consistency. If a writes a value to the storage system, the storage system ensures that the same value is not updated before reading A, B, and C, in the end, all read operations will read the latest value of the write. In this case, if no failure occurs, the size of the "inconsistency window" depends on the following factors: interaction latency, system load, and the number of replica in the replication technology (this can be understood as the number of salve in the master/salve mode). The most famous system in terms of final consistency is the DNS system, after updating the IP address of a domain name, all customers will see the latest value based on the Configuration Policy and Cache control policy.
See: http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
3. Technology
3.1 distributed storage
(1) Master/Slave
Advantage: mature and stable
Disadvantages: single point of failure in write operations and slave latency
(2) Multi-Master
Advantage: Multiple masters solve single point of failure (spof)
Disadvantage: inconsistency is not easy to implement
(3) Two phase commit
Advantage: simple consistency Algorithms
Disadvantage: No Fault Tolerance
(4) three phase commit
Advantage: an agreement can be reached after a single point of failure occurs.
See: http://sebug.net/paper/databases/nosql/Nosql.html#_08464202471077442_91161458194
3.2 consistent hash
Consistent hash is a clever hash algorithm that is effective in solving the Load Balancing Problem of distributed systems.
See: http://www.cnblogs.com/leoo2sk/archive/2011/08/11/consistent-hashing-intro.html
3.3 quorum NRW
N: Number of copied nodes
R: Minimum number of nodes for successful read Operations
W: Minimum number of nodes for successful write operations
W + r> N, Strong Consistency
W + r <= N, final consistency
See: http://sebug.net/paper/databases/nosql/Nosql.html#NRW_012323816604251636_2127662_10272764961707637
3.4 vector clock
If W = 1 R = n, a complicated merge problem may occur. In this case, we can use the vector clock method. If the system does not require great flexibility, W = n can simplify the design.
See: http://en.wikipedia.org/wiki/Vector_clock
3.5 gossip
Virus-based transmission mode. Each node maintains a vector clock and a state version tree, which is currently being used by Cassandra.
For details, see:
Http://sebug.net/paper/databases/nosql/Nosql.html#gossip_34187653195112944_16061_08507828080528557
4. Mainstream nosql Products
(1) Big Table (Google)
(2) dyname (Amazon)
(3) hbase (APACHE)
(4) CASSANDRA (Facebook)
(5) couchdb (APACHE)
(6) MongoDB
(7) redis
(8) Riak