NoSQL Movement: Database Architecture Choice

Source: Internet
Author: User
Keywords We scripts choices choices extensions

Guide: Mike Loukides is the vice president of the content strategy of O ' Reilly Media, and he is very interested in programming languages and UNIX system management, with system configured tuning and UNIX power Tools. In this article, Mike Loukides put forward his insightful insights into nosql and thought deeply about all aspects of modern database architecture.

In a conversation last year, Basho's CTO Justin Sheehy that NoSQL was a sport, not technology. I immediately agree, because the previous discussion of NoSQL is not comfortable.

So why is it that NoSQL is a sport, not technology? Justin's argument is straightforward: the reason that NoSQL is a sport is because it is a choice of database architecture. Any single technical subject will obscure the essence of the NoSQL movement.

Since the 80, relational databases such as SQL Server, Oracle, and DB2 have been the dominant backend business systems. These relational database products are excellent and have many similarities.

Looking back over the past 15 years of software development, we have built a number of excellent large-scale database applications, including Web applications. But since the birth of the relational database, many changes have been made in the database field:

Data surges. Although storage capacity and CPU speed are developing rapidly, so that the database can cope with the proliferation of data, but the volume of data is indeed a thorny issue, for any important database, distributed is essential. Sub-second query response. In the 80 's, database queries run in batches, and queries are inefficient. Now the internet has been developed from static files to sites supported by complex databases, and for most queries, we need a second-level response time. 7*24 hours are running normally. Setting up redundant servers for static HTML files is easy, but complex database replication is another matter. The capture of high-speed data streams is becoming more and more important. Many application background databases absorb data much faster than processing data queries. For example, in log applications or distributed sensing application databases, writes are more frequent than queries. ETL (data extraction, transformation and loading) is indispensable, but the capture of high-speed data streams is becoming more and more important. Unstructured data. Unstructured data exists long ago and is not a new landscape in the data world, but we increasingly do not want to enforce the data structure. At the expense of acid. ACID (atomicity, consistency, isolation, persistence) is important, but the challenges of modern applications make us realize that we have to make sacrifices to implement other features, such as low latency and usability.

The constant change in demand forces us to think of a new database solution:

Distributed。 A large database is only one reason for being distributed, and another reason is that modern applications, especially Web applications, require instant responses from many online users. Response time every second, will cause a large number of user churn. Real-time calculations. If you are building online applications to support Business Analytics, users will inevitably expect real-time business analytics. It's not only convenient, it's a daily execution of hundreds of inquiries, and it's completely changed our work. Scalability. Scalability is a big problem if you're building a customer-facing application for business analysis. The vertical scalability is already near the limit, and the physical law restricts the clock frequency of the Intel architecture to 3.5GHz, and horizontal scalability (the construction of a multi-node distributed system) is the only way. High availability. A single point of failure in any part of the application architecture can have disastrous consequences, and the database system must provide high availability support. A high-availability system is naturally a distributed system. Data fragmentation. For a given distributed database, the next problem is data fragmentation. A relational database uses manual fragmentation between multiple hosts, or partitions a dataset based on some attributes of the data itself. MongoDB is very easy to fragment data, HBase, Riak and Cassandra are distributed databases. Schemaless (modeless). NoSQL databases are often referred to as schemaless (modeless) because they are independent of the schema shape of a relational database. In fact, NoSQL is not completely modeless. In a document database such as COUCHDB or MongoDB, a document is a number of key-value pairs (key-value). Riak can also be viewed as a document database, but more flexible than a document type. Cassandra and HBase are called column-oriented databases. In most application development, NoSQL database has less prophase planning, more flexibility and more suitable for agile development. Acid and cap. Acid properties are pervasive, but if we think about the architecture of the database, we find it difficult to implement ACID properties such as consistency and isolation for a distributed system. Acid properties are important, but free choices need to be compromised. The CAP law states that consistency, availability, and partitioning fault tolerance can only meet both of them in a distributed computing system. scripting language. All relational databases have SQL language variants (for example, T-SQL and PL) as data scripting languages. In the world of non relational databases, there are also scripting languages available. Couchdb and Riak support JavaScript scripts, and so does MongoDB. Several scripting programming language projects (including pig and Hive) that are split out of the Hadoop project apply to HBase. Redis project is experimenting with integrating the LUA scripting language。 RESTful interface. Only Couchdb and Riak provide the restful interface. Graphics。 NEO4J is a database specially designed to maintain graphics. Graphs are very flexible data structures that can simulate any other type of database. SQL. We've been talking about the NoSQL movement, but we can't ignore the familiar programming language of SQL. Someone is working on porting SQL to Hadoop, and maybe we'll adopt a hybrid database architecture in the future. Scientific data. SCIDB is a database project for large-scale scientific research applications, and its storage model is based on multidimensional arrays. SCIDB storage can easily scale to hundreds of PB and collect dozens of TB of data per night. Mixed architecture. The NoSQL movement is closely related to the choice of database architecture. Perhaps the final database schema selection is a hybrid architecture, not a single database technology. Only the choice of mixed structure can be absorbing and adapting to the development of technology. Hybrid architecture is the best way to integrate social features into traditional e-commerce sites.

Written in the last

The NoSQL movement has led us to think about what is the database architecture solution we want. Perhaps we will eventually understand that there is no universal truth. (Zhang Zhiping/compiling)

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.