Database research in cloud computing era

Source: Internet
Author: User
Keywords Cloud computing cost tradition expansion

Introduction

With the advent of the cloud computing era, various types of Internet applications are emerging, and the related data model, distributed architecture, data storage and other database related technical indicators also put forward new requirements. Although the traditional relational database has occupied the unshakable position in the data storage, but because of its inherent limitation, has been unable to satisfy the cloud computing age to the data expansion, reads and writes the speed, the support capacity as well as the construction and the operation cost request. The era of cloud computing has put forward a new demand for database technology, which is mainly manifested in the following aspects.

Mass data processing: for large applications such as search engines and carrier-level business analysis systems, it is necessary to be able to handle PB-level data while coping with millions traffic.

Large-scale cluster management: Distributed applications can be deployed, applied, and managed more simply.

Low latency read/write speed: Rapid response speed can greatly improve user satisfaction.

Construction and operating costs: the basic requirements for cloud computing applications are a significant reduction in hardware costs, software costs, and human costs.

The disadvantage analysis of relational database

With the development of Web2.0, the traditional relational database has exposed many insurmountable problems in coping with the super large-scale and high concurrent SNS type website, which is mainly shown in the following aspects.

(1) High concurrent read and write speed is slow

This kind of situation mainly occurs when the data quantity reaches a certain scale, because the relational database system logic is very complex, causes it to be very apt to deadlock and so on concurrency problem, causes its reading and writing speed to drop very serious. For example, Web2.0 Web site to the user personalized information to generate real-time dynamic page, provide dynamic information, so basically can not use dynamic page static technology, so the database concurrent load is very high, often to achieve tens of thousands of times per second read and write requests. Relational databases can barely handle tens of thousands of SQL queries, hard disk I/O often cannot afford tens of thousands of SQL write data requests.

(2) Limited support capacity

such as Facebook, Twitter, such as SNS sites, users generate a huge amount of user dynamics every day, will produce hundreds of millions of user dynamic, for relational databases, in a hundreds of millions of records of the interior and exterior of the SQL query, the efficiency is extremely low even intolerable.

(3) Poor scalability

In the web-based architecture, the database is the hardest to scale horizontally, and when the number of users and accesses of an application system grows, traditional relational databases have no way to extend performance and load capabilities simply by adding more hardware and service nodes than Web servers. For many web sites that need to provide uninterrupted service, it is very painful to upgrade and expand the database system, which often requires downtime maintenance and data migration, so it is imperative that relational databases can be extended by adding server nodes continuously.

(4) High cost of construction and operation

Enterprise-class databases are expensive and are rising as the system grows in size. High construction and operation costs cannot meet the needs of cloud computing applications.

relational database encounters the bottleneck which is difficult to overcome, at the same time, many of its main features are often useless in cloud computing applications, such as: database transaction consistency, realism of database and reading real-time, complex SQL query, especially Multiple table association query. Therefore, the traditional relational database has not been able to deal with the various applications of the cloud computing era independently.

NoSQL Database Data Model

relational database is more and more unable to meet the application scenario of cloud computing, in order to solve this kind of problem, the non relational database comes into being, because in the design and the traditional relational database has the very big difference, so this kind of database is called "NoSQL (not only SQL)" series database. Compared with relational databases, they are very concerned with the storage of high data, concurrent read-write and massive data, simplifying the architecture and data model, and enhancing the extension and concurrency. Currently, the mainstream NoSQL database includes BigTable, HBase, Cassandra, SimpleDB, CouchDB, MongoDB, and Redis. NoSQL commonly used data models include the following 3 kinds.

(1) column-oriented (column type)

A column uses a model such as table, however, it does not support a multiple-table operation like a join, and its main feature is that when storing data, it is mainly around "columns" instead of being stored according to rows (row), as in traditional relational databases, that is to say, Data that belongs to the same column is stored as much as possible on the same page as the hard disk, rather than storing data that belongs to the same row. The advantage of doing this is that for many similar data warehouse applications, although a lot of data is processed for each query, there are not many columns involved. Using a column database will save a lot of I/O, and most of the column databases support column accessibility, which enables multiple columns to be a group. The advantage of this is that you can store similar columns together to improve the storage and query efficiency of these columns. Overall, this data model has the advantage of being more suitable for applications such as aggregation and data warehousing.

(2) Key-value

Although key-value this model and the traditional relationship is relatively simple, a bit similar to the common Hashtable, a key corresponding to a value, but it can provide a very fast query speed, large data storage and high concurrency operations, very suitable for the data through the primary key to query and modify the operation , although complex operations are not supported, this defect can be remedied through the development of the upper layer.

(3) Document (documents)

In structure, document and Key-value are very similar, and a key corresponds to a value, but this value is primarily stored in JSON or XML-formatted documents and is semantically, and document DB can generally create secondary index for Value to facilitate upper-level application, which is not supported by ordinary Key-value db.

Comparison of common NoSQL databases and analysis of advantages and disadvantages

(1) Comparison of main NoSQL database

This paper compares the bigtable, Cassandra, Redis and MongoDB from the aspects of design idea, data pattern and distribution, and see table 1.

(2) Analysis of advantages of NoSQL database

The NoSQL database has the following main advantages:

The extension is simple, the typical example is Cassandra, because its architecture is similar to the classic Peer-to-peer, it can expand the cluster by simply adding new nodes;

Fast reading and writing, typical example is Redis, because of its simple logic, pure memory operation, so it has excellent performance, single node can handle more than 100,000 times per second read and write operations;

Low cost, since most NoSQL databases are open source software, with no expensive cost constraints.

(3) Analysis of the disadvantage of NoSQL database

Although NoSQL has many notable advantages, there are still many deficiencies, mainly in:

Do not provide support for SQL, users will have a certain application migration costs, at the same time, can not achieve the combination of applications, play the SQL database has been very mature advantages;

The supported features are not rich enough, the existing NoSQL database provides very limited functionality, most of them do not support transaction and other additional functions;

Products are not mature enough, most NoSQL database products are still in the initial stage, and has been very sophisticated relational database.

Concluding

The main common types of cloud computing are two scenarios: requires low latency and high concurrent read and write capabilities, large amounts of data but no more than terabytes, most Web applications now using RDBMS fall into this category, similar to traditional OLTP (online transaction processing); Storage and operation of massive data, such as PB-level, Examples of this are traditional data warehouses, Google's massive web pages and image storage, similar to traditional OLAP (online analytical Processing). At present, the industry does not have a database to adapt to the various cloud computing scenarios NoSQL database. Given the complexity of the requirements of the PAAs platform, the ability to customize the database in the background will be a trend for the future, so lightweight, highly scalable and high-reliability architecture design will be welcome.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.