NoSQL, natural selection when data is large

Last Update:2014-12-17 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How SQL solves the problem of excessive data volume

Typically, we use SQL databases to store our data. At the beginning of the project, the user volume is smaller, the corresponding data volume is relatively small, and the concurrency pressure is relatively small. As our web applications become more and more popular, users will soar, traffic will soar, and the amount of data will become larger. And then they go through a few changes.

1. Primary and Standby separation

When the number of users surges, the problem that often starts first is that there is not enough read and write performance. So in order to solve this problem, create a few standby libraries. All writes are performed on the main library, then synchronized to the standby, and all read operations are in the repository.

This refactoring will be good for read and write performance.

2. Sub-Library

But as the number of users increases, the data for all tables in the same application at this time cannot be accommodated on a single machine. This time again requires refactoring.

The solution is to extract some tables with a particularly large amount of data and store them on a single machine. This can alleviate the problem of excessive data volume in certain procedure.

3. Sub-table

When the amount of data increases further, it will be found that even a single machine can store only one card.

This is done by splitting the contents of a table into multiple tables and storing them on separate machines.

After the above three steps data is too large problem basically solved.

But, the key but

What we lost after the sub-table

First look at some of the key features of SQL database

A transaction: Includes a (atomicity), C (consistency), I (Isolation), D (persistence)

b Multi-Table union: This is the case when join,select from multiple tables

C Index: Query according to a condition

E Aggregation operation: count (), sum (), group BY, have

1. Primary and Standby separation

The isolation of the transaction is lost at this time.

If there is read and write in a transaction, the read operation is performed on the standby, while the write is performed on the main library, and the SQL transaction manager cannot manage operations on different machines. When the value of the main machine changes, the standby library is not instantaneous perception, synchronization takes time, although this time is very short, may only 1/1000 seconds, but the 1/1000 difference will also cause the transaction to fail.

2. At the time of the library

The transaction is lost at this time. As long as a transaction involves multiple tables distributed across different machines, the entire transaction will not take effect. The SQL database itself cannot manage transactions across machines. There are, of course, cross-machine transaction scenarios, but in general, the efficiency is very low, so the distributed transaction scheme can be almost ignored.
Multi-table union complete failure. When a table is on a different machine, a federated query between tables that would be completely impossible.

3. When the table is divided

At this point, the key features of SQL are almost useless.

transactions, multi-table unions, indexes, and aggregations are virtually unusable.

The index is not available. For example, a blog table, usually according to the user to the table, will be the same user all the data are placed on the same machine, but if we create time to query, want to query all the users in the journal published today, then this will be invalidated. At this point you need to go to each of the sub-table query and then the query results are assembled. And the native SQL is not complete.

When an index is unavailable, the aggregation operation is certainly not allowed.

SQL or NoSQL

So you can see that SQL is almost dead when the data is large. Almost all of the key features are not available. It's pretty much the same (and of course, a little bit more). It is very difficult to manipulate other dimensions by using the primary key or partition column (which determines which machine's columns, such as time, user IDs) are in the data.

Consider a NoSQL database, such as HBase Cassandra MongoDB, as if it were almost only a primary key operation.

So it can be said that, when the relational database has a sub-library table, the system becomes nosql, because most of the important features of the off-system database are completely unusable, and the available parts of NoSQL are almost always there.

Although the Sub-database table solves the problem of large data volumes, NoSQL does go further and provide more exciting features.

1. Automatic expansion. For the relational database, the expansion of the branch table is a very very difficult thing, each expansion must be planned ahead of time, must be carefully evaluated, and do not need data migration, for a running system this is a very big challenge. and NoSQL data, which has built-in, automates these features.

2. Synchronize the replicas. HBase Cassandra has a synchronous copy function, each table can be set to synchronize the value of the other machine, when a machine down, the operation is transferred directly to another machine, this transfer is instantaneous completion. Of course, MySQL Oracle also has a primary and standby switching function, but the primary and standby switching takes time. So NoSQL has better fault tolerance for off-grid outages.

Of course, a lot of companies are now turning to NoSQL, but the use of relational database sub-tables is still very large. But with big data volumes, moving from SQL to NoSQL has become an unstoppable trend.

NoSQL, natural selection when data is large

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More