"IT168" I have always felt that nosql in fact very easy to understand, I have also had a very in-depth study of NoSQL, but in the recent preparation of yuntable chart, found that NoSQL not only very profound, And my personal understanding of NoSQL is only superficial, but I am still a "shame and then Yong" people, so after a period of study, from the sixth part of this series, will talk to you nosql, and this article will be the main to do a NoSQL database review.
First of all, I will talk to you about why NoSQL in a relational database has become very popular in the case of the emergence of?
The reason for the birth
With the continuous development of the Internet, various types of applications are emerging, so in this era of cloud computing, the technology put forward more demand, mainly reflected in the following four aspects:
1. Low latency reading and writing speed: The application of rapid response can greatly enhance the user satisfaction;
2. Support a large number of data and traffic: for the search for such large-scale applications, the need to use PB-level data and can respond to millions traffic;
3. Management of large-scale clusters: system administrators want distributed applications to be simpler to deploy and manage;
4. Large operating costs: IT managers want to be able to significantly reduce hardware costs, software costs and human costs;
Although relational databases are already an unshakable part of the industry's data storage, their inherent limitations make it difficult to meet these requirements:
1. Expansion difficulties: Because of the existence of such a join table query mechanism, making the database in the expansion of the difficult;
2. Slow reading and writing: This kind of situation mainly occurs when the data quantity reaches a certain scale, because the relational database system logic is very complex, makes it very easy to have the deadlock and so on concurrency problem, therefore causes its reading and writing speed to fall very serious;
3. High cost: The license price of enterprise database is very alarming, and it is rising with the scale of the system.
4. Limited support capacity: Existing relational solutions can not support such a large number of Google data storage;
The industry has introduced a variety of new types of databases to address several of the requirements mentioned above, and is collectively known as the "NoSQL" series of databases because they are very different in design from traditional NoSQL databases. On the whole, in design, they are very concerned about the data high concurrency read and write and the storage of massive data, compared with relational database, they are in the architecture and data model side to do the "subtraction", and in the expansion and concurrency and so on to do "addition." The current mainstream NoSQL database is BigTable, HBase, Cassandra, SimpleDB, CouchDB, MongoDB, and Redis. Next, look at the pros and cons of the NoSQL database.
In the advantages, mainly reflected in the following three points:
1. Simple extensions: A typical example is Cassandra, whose architecture is similar to classic peer-to-peer, so it can be expanded by easily adding new nodes;
2. Fast reading and writing: The main example has Redis, because of its simple logic, and pure memory operation, so that its performance is very good, single node can handle more than 100,000 times per second read and write operations;
3. Low cost: This is a common feature of most distributed databases, because it is mainly open source software, there is no expensive license cost;
But flaws, NoSQL database still has a lot of deficiencies, common mainly have following these several:
1. Do not provide support for SQL: If you do not support the industry standards such as SQL, you will have a certain learning and application of migration costs;
2. The supported features are not rich enough: existing products offer less functionality, most NoSQL databases do not support transactions, and do not provide additional functionality like MS SQL Server and Oracle, such as BI and reports;
3. The existing product is not mature: Most products are still in the start-up period, and relational database for decades of perfect;
The advantages and disadvantages of the above NoSQL products are more common, in fact, each product will be based on their own data model and cap concept, and next, we will introduce nosql two most important concepts: data model and cap concept, and at the end of this article, Classify the mainstream NoSQL database.
The traditional database in the data model aspect, mainly is the relational type, its characteristic is the join class operation and the acid transaction support. In the NoSQL field, there are three main types of data models:
column-oriented (column type)
The column also uses a model such as table, however, it does not support a multiple-table operation like a join, and its main feature is that when storing data, it is mainly around "columns" instead of being stored according to rows (row), as in traditional relational databases, that is to say, Data that belongs to the same column is stored as much as possible in the same page of the hard disk, rather than storing data that belongs to the same row, and the benefit of this is that for many similar data warehouse applications, although each query handles a lot of data, the Warehouse But there are not many columns involved each time, so if you use a column database, you will save a lot of I/O, and most of the column databases support column accessibility, which adds multiple columns to a group, The advantage of this is that you can store similar column together, which improves the storage and query efficiency of these column. Overall, this data model has the advantage of being more suitable for applications such as summary (Aggregation) and data warehousing.
Although Key-value this model is relatively simple with the traditional relational type, a bit similar to the common Hashtable, a key corresponds to a value, but it can provide very fast query speed, large data storage and high concurrency operations, And very suitable for the data through the primary key to query and modify operations, although not to support complex operations, but can be through the development of the upper layer to make up for this defect.
In structure, document and Key-value are very similar, and a key corresponds to a value, but this value is primarily stored in JSON or XML-formatted documents and is semantically, and document DB can generally create secondary index for Value to facilitate upper-level application, which is not supported by ordinary Key-value db.
This theory was proposed by Eric Brewer, founder of the famous American scientist and Inktomi, the famous Internet Enterprise, in the 2000 PODC (Symposium on Principles Distributed of Computing). Later, Seth Gilbert and Nancy Lynch both proved the correctness of cap theory, although in the last ten years, many people have raised many objections to cap theory, but in the NoSQL world, it is very valuable reference. It means that a distributed system can not meet the same consistency, availability and partitioning fault tolerance of these three requirements, up to two at the same time.
1. Consistency (consistency): Any read operation can always read the previous completed write operation results, that is, in the distributed environment, the data is consistent;
2. Availability (availability): Each operation is always able to return within a certain time, that is, the system is available at all times.
3. Partition tolerance (Partition tolerance): In the case of network partitions (such as broken nets), separate systems can also function properly.
Since there are only two options for consistency, availability, and partitioning tolerance, most nosql systems will choose according to their own design concepts, but because many NoSQL databases are known for their level of expansion, they tend to insist on partitioning tolerance in the CAP selection, Instead of either consistency or usability, their approach is primarily to reduce relationship-related and transactional functionality.
The following specific categories are from the visual Guide to NoSQL Bae, although I personally feel that there are some far-fetched places for this classification, such as the dynamo that can support multiple caps configurations and its derivative Cassandra are categorized as AP, but overall , this classification is very good, at this stage is very valuable, in each related database will also introduce the corresponding data model.
▲ Figure 1. NoSQL Product category (reference 1)
Focus on consistency and availability (CA)
These databases are less tolerant of partitioning, mainly in the form of replication (Replication) To ensure data security, common CA systems are:
1. Traditional relational databases, such as Postgres and MySQL (relational);
2. Vertica (column-oriented);
3. Aster Data (relational);
4. Greenplum (relational);
Focus on consistency and partitioning tolerance (CP)
This system distributes data across multiple network partitions and ensures consistency, but there are problems with availability support, such as when a cluster problem occurs, and a node may refuse to provide a service because it cannot ensure that the data is consistent, and the main CP systems are:
1. BigTable (column-oriented);
2. hypertable (column-oriented);
3. HBase (column-oriented);
4. MongoDB (Document);
5. Terrastore (Document);
6. Redis (Key-value);
7. Scalaris (Key-value);
8. Memcachedb (Key-value);
9. Berkeley DB (Key-value);
About availability and partitioning tolerance (AP)
Such systems are primarily for achieving "final consistency (eventual consistency)" to ensure availability and partition tolerance, and the AP system has:
1. Dynamo (Key-value);
2. Voldemort (Key-value);
3. Tokyo Cabinet (Key-value);
4. KAI (Key-value);
5. Cassandra (column-oriented);
6. CouchDB (document-oriented);
7. SimpleDB (document-oriented);
8. Riak (document-oriented);
In the next issue of cloud computing behind the secret, will focus on the introduction of my personal design of a NoSQL database, called yuntable.
1. Visual Guide to NoSQL Bae
2. NoSQL Database
3. One of the NoSQL database discussions-Why use a non-relational database?
Wu Zhuhua, previously involved in the development of multiple cloud computing products at IBM China Academy, is now focused on yuntable "http://code.google.com/p/yuntable/" and yunengine http:// yunengine.com/"Research and development, and will publish" Analysis of cloud Computing, "a book, please look forward to.