NoSQL data Model and cap principle

Source: Internet
Author: User
Tags cassandra hypertable couchdb

I had always thought that NoSQL was easy to understand and I had a very thorough study of nosql myself, but in the recent preparation of the yuntable chart, I found that NoSQL was not only very profound, but also my personal understanding of NoSQL was just fur. But I am a "shame and then brave" people, so after a period of study, from the beginning of this series sixth, will talk about NoSQL, and this article will be mainly to do a review of NoSQL database.

First, let's talk about why NoSQL has sprung up when relational databases are already very popular.

the cause of the birth

With the continuous development of the Internet, various types of applications emerge, so in this era of cloud computing, the technology put forward more demand, mainly reflected in the following four aspects:

1. Low latency reading and writing speed: The application of rapid response can greatly improve user satisfaction;
2. Support massive amounts of data and traffic: To search for such large-scale applications, it is necessary to use petabytes of data and traffic that can handle millions;
3. Large-scale cluster Management: System administrators want distributed applications to be easier to deploy and manage;
4. Large operating Cost considerations: IT managers want to have a significant reduction in hardware costs, software costs, and labor costs;

Although the relational database has an unshakable position in the industry's data storage, it is difficult to meet these requirements due to its inherent limitations:

1. Expansion difficulties: Due to the existence of a similar join such a multi-table query mechanism, making the database is very difficult to expand;
2. Slow reading and writing: This situation mainly occurs when the data volume reaches a certain scale due to the complexity of the system logic of the relational database, which makes it very prone to the concurrency problem of deadlock and so on, so its reading and writing speed decline is very serious;
3. High cost: The license price of enterprise database is amazing, and it keeps rising with the scale of the system.
4. Limited support capacity: Existing relational solutions cannot support Google's massive data storage;

The industry has introduced a number of new types of databases to address several of the requirements mentioned above, and since they are designed to be quite different from traditional NoSQL databases, they are collectively referred to as "NoSQL" series databases. On the whole, in design, they are very concerned about the high concurrency of data read and write and storage of large amounts of data, compared with the relational database, they are in the architecture and data model side of the "subtraction", and in the extension and concurrency, and so do "Add." The mainstream NoSQL database now has bigtable, HBase, Cassandra, SimpleDB, CouchDB, MongoDB, and Redis. Next, focus on what the pros and cons of a NoSQL database are.

Pros and cons

In terms of advantages, mainly reflected in the following three points:

1. Simple extension: typical example is Cassandra, because its architecture is similar to the classic peer-to, so it can easily add new nodes to expand the cluster;

2. Fast read and write: The main example is Redis, because of its simple logic, and pure memory operation, so that its performance is excellent, single node can handle more than 100,000 reads and writes per second;
3. Low cost: This is the common feature of most distributed databases, because it is mainly open source software, there is no expensive license cost;

But his flaws, the NoSQL database still has a lot of shortcomings, the common main has the following these several:

1. Does not provide support for SQL: If the industry standard such as SQL is not supported, the user will have some learning and application migration cost.
2. Unsupported features are not rich enough: existing products offer limited functionality, most NoSQL databases do not support transactions, and do not provide a variety of additional features like MS SQL Server and Oracle, such as BI and reports;
3. The existing product is not mature enough: most of the products are still in the start-up period, and the relational database decades of improvement is not the same;

The pros and cons of the above NoSQL products are quite common, and in reality each product will be different according to the data model and the CAP concept that we follow, and then we will introduce the two most important concepts of NoSQL: The data model and the CAP concept, and at the end of this article, Classify the mainstream NoSQL databases.

Data Model

The traditional database in the data model aspect, mainly is the relational type, its characteristic is the join class operation and the acid transaction support. In the NoSQL world, there are three main data models in the mainstream:

column-oriented (column type)

The column also mainly uses a model such as table, but it does not support the operation of multiple tables like join, its main feature is that when storing data, mainly around the "column", rather than as the traditional relational database based on "row" storage, that is, Data that belongs to the same column is stored as much as possible on the same page as the hard disk, rather than storing the data that belongs to the same row, the benefit is that for many applications like data Warehouse, even though each query handles a lot of data, But there are not a lot of columns involved each time, so if you use a column database, you will save a lot of I/O, and most of the column databases support the feature of column family, which adds multiple columns to a single group with this special performance. The benefit is that similar column can be stored together to improve the storage and query efficiency of these column. Overall, the advantage of this data model is that it is more suitable for applications such as aggregation (Aggregation) and data warehousing.

Key-value

Although key-value this model and the traditional relationship is relatively simple, a bit similar to the common Hashtable, a key corresponding to a value, but it can provide very fast query speed, large data storage and high concurrency operation, It is well suited for querying and modifying data through primary keys, although it does not support complex operations, but can compensate for this defect through the development of the upper layer.

Document (documents) structurally, document and Key-value are very similar, but also a key corresponding to a value, but this value is mainly in JSON or XML format of the document to store, is semantic, and document DB can generally create secondary index for Value to facilitate upper-level applications, which is not supported by ordinary Key-value db.

Cap theory

The theory was made by famous American scientists and founder Eric Brewer, a well-known internet company Inktomi, at the 2000 PODC (Symposium on Principles of Distributed Computing) Conference, Later, Seth Gilbert and Nancy Lynch both proved the correctness of the CAP theory, although in the next decade many people have raised a lot of objections to the CAP theory, but in the NoSQL world, it is very useful. It means that a distributed system cannot meet the three requirements of consistency, availability, and partition fault tolerance at the same time, and can only meet up to two at a time.

1. Consistency (consistency): Any read operation will always be able to read the previous completed write operation results, that is, in the distributed environment, multi-point data is consistent;
2. Availability (availability): Each operation is always able to return within a determined time, that is, the system is available at all times.
3. Partition tolerance (Partition tolerance): In the case of a network partition (such as a broken network), the detached system can also function properly.

Since there are only two choices in terms of consistency, availability, and zoning tolerance, most nosql systems will choose according to their own design concepts, but since many NoSQL databases are known for their horizontal expansion, they tend to adhere to the tolerance of partitioning on the CAP choice, Instead of consistency or usability, their approach is primarily through the reduction of relational and transactional-related functions.

Specific Categories

The following specific classification is from the visual Guide to NoSQL Systems article, although for this category I personally feel that there are some far-fetched places, such as to support a variety of CAP configuration dynamo and its derivatives Cassandra classified as APS, but overall , this classification is still quite good, at this stage is very reference value, after each relevant database will also introduce the corresponding data model.


▲ Figure 1. NoSQL Product Classification Chart (ref. 1)

NoSQL is just a concept, and NoSQL databases are divided into many categories based on the data storage model and features.

Type

Section represents

Characteristics

Column Storage

Hbase

Cassandra

Hypertable

As the name implies, data is stored in columns. The biggest feature is the convenient storage of structured and semi-structured data, easy to do data compression, for a column or a few columns of the query has a very large IO advantage.

Document storage

Mongodb

Couchdb

Document storage is typically stored in a JSON-like format, and the stored content is document-based. This also gives you the opportunity to index certain fields and implement certain functions of the relational database.

Key-value Storage

Tokyo cabinet/tyrant

Berkeley DB

Memcachedb

Redis

You can quickly query to its value with key. In general, the format of the store regardless of the value of the full receipt. (Redis includes other features)

Diagram Storage

Neo4j

Flockdb

The best storage for graphical relationships. The use of traditional relational databases to address the performance of poor, and design use is not convenient.

Object storage

Db4o

Versant

The database is manipulated by object-oriented syntax, and data is accessed through objects.

XML database

Berkeley DB XML

BaseX

Efficiently stores XML data and supports internal query syntax for XML, such as Xquery,xpath.

The above-mentioned NoSQL database types are not absolute, but are generally divided from the storage model. There is no absolute demarcation between them, and there is a case, such as the table type store of Tokyo Cabinet/tyrant, which can be understood as a document-type storage, Berkeley DB XML database was developed on the basis of Berkeley db.

Focus on consistency and availability (CA)

These databases are more tolerant of partitioning, mainly using replication (Replication) To ensure data security, the common CA system is:

1. Traditional relational databases, such as Postgres and MySQL (relational);
2. Vertica (column-oriented);
3. Aster Data (relational);
4. Greenplum (relational);

Focus on consistency and partitioning tolerance (CP)

This system distributes data across nodes of multiple network partitions and ensures consistency of the data, but there is a problem with the availability support, such as when a cluster is in trouble, the node may refuse to provide services because it cannot ensure that the data is consistent, and the main CP system is: 1. BigTable (column-oriented);
2. hypertable (column-oriented);
3. HBase (column-oriented);
4. MongoDB (Document);
5. Terrastore (Document);
6. Redis (Key-value);
7. Scalaris (Key-value);
8. Memcachedb (Key-value);
9. Berkeley DB (Key-value);

About availability and partition tolerance (AP)

This type of system is primarily to achieve "final consistency (eventual consistency)" to ensure availability and partition tolerance, AP systems are:

1. Dynamo (Key-value);
2. Voldemort (Key-value);
3. Tokyo Cabinet (Key-value);
4. KAI (Key-value);
5. Cassandra (column-oriented);
6. CouchDB (document-oriented);
7. SimpleDB (document-oriented);
8. Riak (document-oriented);

NoSQL data Model and cap principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.