SQL-to-nosql thinking shifts

Source: Internet
Author: User
Tags database join

NoSQL systems generally advertise a feature that is good performance and then why? relational database development for so many years, various optimization work has been done very deep, nosql system is generally the absorption of relational database technology, and then, what is the constraints of the relational database performance? We look at this problem from the point of view of system design.

1, index support. relational database at the beginning of the creation did not think of today's Internet applications to the scalability of such a high demand, therefore, the main consideration in the design is to simplify the user's work, SQL language produced to facilitate the standardization of database interface, thus forming a database company such as Oracle and the development of upstream and downstream industrial chain. relational databases support indexes on the stand-alone storage engine, such as MySQL's InnoDB storage engine, which needs to support indexing, while the NoSQL system's stand-alone storage engine is purely for support of random read and range queries based on primary keys. NoSQL systems provide support for indexes at the system level, such as having a user table, a primary key of user_id, and a number of attributes for each user, including user name, photo ID (photo_id), photo URL, and, if necessary, indexing photo_id in a nosql system. Can maintain a distributed table, the table's primary key is <photo_id, user_id> formation of the two-tuple. Because of the need to support indexes at the level of the single-machine storage engine, the relational database greatly reduces the scalability of the system, which makes the design of the single-machine storage engine complicated.

2, transaction concurrency processing. The relational database has a set of theories about transaction concurrency processing, such as the granularity of the lock is table level, page level or row level, multi-version concurrency control mechanism MVCC, transaction isolation level, deadlock detection, rollback, and so on. However, most of the characteristics of Internet applications are more read less, such as the ratio of reading and writing is 10:1, and few complex transaction requirements, therefore, generally can be used more simple copy-on-write technology: single-threaded write, multithreaded read, write when the execution of Copy-on-write, Write does not affect the Read service. The assumption of the NoSQL system simplifies the design of the system, reduces the overhead of many operations and improves performance.

3, dynamic or static data structure. relational database storage engine is always a disk B + tree, in order to improve performance, may need to have insert buffer aggregation write, query cache cache read, often need to implement similar Linux page cache management mechanism. The read and write in the database are affected by each other, and the write operation is not as high as it needs to flush data to disk from time to time. In short, the data structure of the relational database storage engine is a universal, dynamically updated B + Tree, however, in a nosql system, such as BigTable with sstable + memtable data structures, it is first written to the memtable of memory, To a certain size or more than a certain amount of time will dump to disk generated sstable file, sstable is read-only. If the data structure of the relational database storage engine is a dynamic B + tree, then sstable is a sequential ordered array. Obviously, it is much simpler and more efficient to implement an ordered data than to implement a dynamic B + tree and contain complex concurrency control mechanisms.

4, join operation. relational databases need to support joins at the storage engine level, and nosql systems generally depend on the application to determine how the join is implemented. For example, there are two tables: User table and Commodity table, each user may have a number of items, the user table of the primary key is <user_id, Item_id> the user and the product's associated attributes are stored in the user table, the primary key of the commodity table is item_id, the commodity attribute includes the product name, Product URLs, and so on. Assuming that the application needs to query all the products of a user and display the details of the product, it is common practice to find all the item_id of the specified user from the user table and then query the details of each item_id to the commodity table, that is, to perform a database join operation, which inevitably brings a lot of random disk reads. And because of the poor locality of random reads brought by join, the effect of caching is often limited. In a nosql system, we can often integrate user tables and commodity tables into a wide table, so that while the details of the product are stored redundantly, the query is efficient.

The performance bottleneck of the relational database is often not in the SQL statement parsing, but in the need to support complete SQL features. The problem for Internet companies is that applications require high performance and scalability, and the level of DBA and development engineers is high, and can be exchanged for better performance by sacrificing some of the interface friendliness. Some of the designs of NoSQL systems, such as join operations through wide tables, have been done by internet company DBAs and development engineers, and the NoSQL system only reinforces this constraint. In the long run, it is possible to summarize a set of constraints and define a subset of SQL so that only supporting this subset of SQL can support Internet applications such as over 90% without sacrificing scalability. I think NoSQL technology is more mature when it comes to this step, and that's what we want to do eventually. When designing and using a nosql system, we can also transform our thinking appropriately, as follows:

1, a larger amount of data. Many people in the process of using MySQL encountered more than a certain number of records, such as 2000W, the database performance began to decline, this value is often required to undergo a large number of tests. However, most nosql systems are more scalable and can support a larger amount of data, so you can also take some space-changing practices, such as join through a wide-table approach.

2, performance estimates are much easier. relational database due to complex concurrency control, insert buffer and the read-write optimization mechanism like page cache, performance estimation is relatively difficult, many times need to rely on experience or tested to derive the performance of the system. Then, NoSQL system because of the storage engine implementation, concurrency control mechanism and so relatively simple, through the hardware performance indicators in the system design to approximate the performance of the system, the performance of the estimated operability is relatively stronger

SQL-to-nosql thinking shifts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.