1. A Brief History of NoSQL
The term NoSQL first appeared in 1998 and is a lightweight, open source, relational database that does not provide SQL functionality developed by Carlo Strozzi.
In 2009, Last.fm's Johan Oskarsson launched a discussion on a distributed open source database [2], and Eric Evans from Rackspace again proposed the concept of NoSQL, when NoSQL mainly refers to non-relational, distributed, does not provide an acid database design pattern.
The No:sql (East) symposium, held in Atlanta in 2009, was a milestone with the slogan "Select Fun, Profit from Real_world where Relational=false;". Therefore, the most common explanation for NoSQL is "non-associative", emphasizing the advantages of key-value stores and the documentation database, rather than simply opposing the RDBMS.
2. What is NoSQL?
NoSQL refers to a non-relational database. NoSQL, sometimes referred to as the abbreviation of not-only SQL, is a generic term for a database management system that differs from a traditional relational database.
NoSQL is used for storage of hyper-scale data. (for example, Google or Facebook collects trillions of bits of data for their users every day). These types of data stores do not require a fixed pattern and can be scaled horizontally without extra action.
3. Why use NoSQL?
Today we can easily access and crawl data through third-party platforms (e.g., google,facebook, etc.). User's personal information, social network, geographical location, user generated data and user operation log have multiplied. If we want to dig these user data, the SQL database is not suitable for these applications, and the development of NoSQL database can handle these big data very well.
4. RDBMS vs NoSQL
Rdbms
-Highly organized structured data
-Structured Query Language (SQL) (SQL)
-Data and relationships are stored in separate tables.
-Data manipulation language, data definition language
-Strict consistency
-Basic services
Nosql
-represents more than just SQL
-No declarative query language
-No pre-defined pattern
-key-value pair storage, column storage, document storage, graphics database
-final consistency, not ACID properties
-Unstructured and unpredictable data
-Cap theorem
-High performance, high availability and scalability
5. Relational databases Follow acid rules
Transactions are transaction in English, similar to real-world transactions, and have the following four features:
(1), A (atomicity) atomicity
Atomicity is easy to understand, that is, all operations in a transaction are either done or not, and the transaction succeeds because all operations in the transaction are successful, and as long as one operation fails, the entire transaction fails and needs to be rolled back.
For example, bank transfer, transfer from a account 100 to B account, divided into two steps: 1) from a account 100 yuan, 2) deposited into the account of 100 to B. The two steps are either completed together, or not completed together, if only the first step, the second step fails, the money will be inexplicably less than 100 yuan.
(2), C (consistency) consistency
Consistency is also relatively easy to understand, that is, the database should always be in a consistent state, the operation of the transaction will not change the original consistency of the database constraints.
For example, if an existing integrity constraint a+b=10, if a transaction changes A, then the B must be changed so that the transaction will still satisfy a+b=10, otherwise the transaction fails.
(3), I (isolation) Independence
The so-called independence refers to the concurrent transactions do not affect each other, if one transaction to access the data is being modified by another transaction, as long as another transaction is not committed, the data it accesses is not affected by uncommitted transactions.
For example, there is a transaction from a to 100 yuan to the B account, in the case of the transaction is not completed, if at this time B query their own account, is not see the new increase of 100 yuan.
(4), D (durability) Persistence
Persistence means that once a transaction commits, its modifications are persisted to the database, even if the outage occurs.
6. Cap theorem (Cap theorem)
In computer science, the cap theorem (Cap theorem), also known as the Brewer's theorem (Brewer's theorem), points out that it is impossible for a distributed computing system to meet the following three points:
- Consistency (consistency) (all nodes have the same data at the same time)
- Availability (availability) (Ensure that each request responds regardless of success or failure)
- Segregation tolerance (Partition tolerance) (loss or failure of any information in the system will not affect the continued operation of the system)
The core of the CAP theory is that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only satisfy two at the same time.
Therefore, according to the CAP principle, the NoSQL database is divided into three categories: satisfying the CA principle, satisfying CP principle and satisfying AP principle.
- CA-A single point of clustering, a system that meets consistency, availability, is often less scalable.
- CP-a system that satisfies consistency, partition tolerance, and usually performance is not particularly high.
- APS-systems that satisfy availability, partitioning tolerance, and generally may have a lower consistency requirement.
7. NoSQL Database Classification
Type |
Section represents |
Characteristics |
Column Storage |
Hbase Cassandra hypertable |
is to store data by column. The biggest feature is the convenient storage of structured and semi-structured data, easy to do data compression, for a column or a few columns of the query has a very large IO advantage. |
Document storage |
MongoDB CouchDB |
Document storage is typically stored in a JSON-like format, and the stored content is document-based. This also gives you the opportunity to index certain fields and implement certain functions of the relational database. |
Key-value Storage |
Tokyo cabinet/tyrant Redis Berkeley DB Memcachedb |
You can quickly query to its value with key. In general, the format of the store regardless of the value of the full receipt. (Redis includes other features) |
Diagram Storage |
Neo4j FLOCKDB |
The best storage for graphical relationships. The use of traditional relational databases to address the performance of poor, and design use is not convenient. |
Object storage |
Db4o Versant |
The database is manipulated by object-oriented syntax, and data is accessed through objects. |
XML database |
Berkeley DB XML BaseX |
Efficiently stores XML data and supports internal query syntax for XML, such as Xquery,xpath. |
Source Document
MongoDB Getting Started learning (a) NoSQL understanding