The limitations of SQL transforming relational database to NoSQL

Source: Internet
Author: User
Keywords Goods very can realize concurrency

NoSQL systems generally advertise a feature that is good performance and then why? relational database has developed for so many years, various optimization work has been done very deep, nosql system is generally absorbing relational database technology, and then, in the end what is the constraints on the performance of relational database? We look at this problem from the perspective of "> System design."

1, index support. At the beginning of the establishment of relational databases, it was not thought that today's Internet applications made such a high demand for scalability, therefore, the design of the main consideration is to simplify the user's work, the production of SQL language to facilitate the standardization of database interfaces, thus forming a database company such as Oracle and led the upstream and downstream industrial chain development. relational databases support indexes on stand-alone storage engines, such as MySQL's InnoDB storage engine, which needs to support indexing, while the NoSQL system's stand-alone storage engine is pure and supports random read and range queries based on primary keys. The NoSQL system provides support for indexing at the system level, such as a user table, a primary key of user_id, and a number of attributes per user, including user name, photo ID (photo_id), photo URL, and if photo_id is indexed in the NoSQL system, Can maintain a distributed table, the table's primary key for the formation of the two-tuple group. Because the relational database needs to support the index at the single storage engine level, the scalability of the system is greatly reduced, so the design of the single storage engine becomes very complicated.

2, transaction concurrency processing. Relational databases have a set of theories about transaction concurrency, such as the granularity of locks is table level, page level or row level, multiple version concurrency control mechanism MVCC, transaction isolation level, deadlock detection, rollback, and so on. However, most of the characteristics of Internet applications are read less, such as the ratio of reading and writing is 10:1, and very few complex transaction requirements, therefore, can generally adopt a more simple Copy-on-write technology: single-threaded Write, multithreading read, write when the implementation of Copy-on-write, Write does not affect the Read service. The assumption of NoSQL system simplifies the design of the system, reduces the overhead of many operations and improves the performance.

3, dynamic or static data structure. relational database storage engine is always a disk B + tree, in order to improve performance, you may need to have insert buffer aggregation write, query cache caching read, often need to implement similar to the Linux page cache management mechanism. Read and write in the database interact with each other, and write operations are not performance-efficient because of the occasional need to flush data to disk. In short, the relational database storage engine's data structure is a common, dynamically updated B + Tree, however, in NoSQL systems, such as bigtable in the sstable + memtable data structures, the data is first written to the memory of the memtable, Reach a certain size or more than a certain amount of time before dump to disk to generate sstable files, sstable is read-only. If the relational database storage engine data structure is a dynamic B + tree, then sstable is a sorted ordered array. It is obvious that it is much simpler and more efficient to implement an ordered data than to implement a dynamic B + tree and contain complex concurrency control mechanisms.

4, join operation. relational databases need to support joins at the storage engine level, and nosql systems generally decide how join implementations are based on applications. For instance, there are two tables: User table and Commodity table, each user may have several items, the user table's primary key, users and merchandise associated attributes stored in the user table, the main key of the commodity table is item_id, commodity attributes include commodity name, commodity URL, and so on. Suppose the application needs to query all the products of a user and display the details of the product, the common practice is to find all the item_id of the specified user from the user table, and then to each item_id go to the Commodity table query details, that is, perform a database join operation, which inevitably brings a lot of disk random read And because the join brings the random reading of the local bad, the effect of caching is often limited. In the NoSQL system, we can often integrate the user table and the commodity table into a wide table, so that although the redundancy stores the details of the product, but the query for the efficient.

The performance bottleneck of relational database is often not on SQL statement parsing, but on the need to support complete SQL features. The problem for Internet companies is that applications are demanding for performance and scalability, and DBAs and development engineers have a high level of performance that can be compromised by sacrificing some interface friendliness. Some of the design of the NoSQL system, such as the use of wide tables for join operations, the DBA and development engineer of the Internet company have also done, the NoSQL system only reinforces this constraint. In the long run, you can summarize a set of constraints and define a subset of SQL, which supports more than 90% of Internet applications without sacrificing scalability. I think the NoSQL technology is more mature when it comes to this step, which is what we ultimately want to do. When we design and use the NoSQL system, we can also properly transform our thinking as follows:

1, a larger amount of data. Many people in the process of using MySQL encountered more than a certain number of records, such as 2000W, database performance began to decline, this value is often required to go through a lot of testing. However, most nosql systems are more scalable and can support larger amounts of data, so there are some space-swapping practices, such as a wide-table approach to join.

2, performance prediction is easier. Due to complex concurrency control, insert buffer and read-write optimization mechanism like page cache, the performance estimation is relatively difficult, and many times it requires experience or testing to get the performance of the system. Then, because of the storage engine and the concurrency control mechanism, the NOSQL system can estimate the performance of the system by the performance index of the hardware, and the performance prediction can be more operational.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.