"Big Talk Store II" study notes (Chapter 15), NoSQL

Source: Internet
Author: User
Tags memcached mysql in

The data center of the internet operator (NSP) is where the data is most concentrated, and because of the massive data storage and access, the traditional storage architecture cannot meet the existing requirements.

For example, hundreds of thousands of times per second of random ioPS, 10GB per second of traffic, generally need to use high-end storage, of course, the price will not be cheap. And the expansibility is not good, the expansion cost is high.

The growing business has led to the gradual use of distributed systems by internet operators to build underlying file systems and databases, such as Google's gfs+bigtable

Let's take a look at the gradual changes in the architecture in response to large concurrency, high traffic, and finally the introduction of NoSQL.

The evolution of traditional database architecture using caching technology

With the increase in traffic volume, the Web site that uses the MySQL architecture has a performance problem on the database.

To deal with the problem of traditional MySQL performance, the most likely thing to think about is to add the cache . If you use only file caching, the performance is not significantly improved because multiple Web servers cannot be shared through the file cache.

So, can we extract the cache separately, form a cache server , use the server's large memory to cache, and this cache server is also distributed deployment, both to ensure the sharing, but also guaranteed performance.

The most typical representative is the memcached cache server, which we can use to extend multiple servers using a consistent hash algorithm.
See another article in detail memcached clusters?

MySQL master-slave read-write separation

Using a cache server can only be a stopgap, the traffic continues to rise, the database will still have bottlenecks. Because the Internet application is mainly to read , since the reading pressure is so large, we can use the cluster to load share it.

However, because the database needs to ensure its atomicity, consistency, and so on, it cannot simply use multiple servers to access this architecture concurrently.

We can read and write separation, the database is divided into master and slave two roles, master is mainly used for writing, slave is mainly used for reading, and slave can be more than one load balancer.

This can improve read and write performance and the scalability of reading libraries. Specifically, you can refer to the database (vii), and read/write separation to CQRS.

Sub-table sub-Library

With the memcached cache and MySQL master-slave replication, you can relieve reading pressure , but what about writing pressure?

At first, MySQL uses the MyISAM engine, which uses table locks , and high concurrency has serious locking problems. So MySQL uses the InnoDB engine instead.

At the same time, we can also be divided into tables to alleviate the pressure of writing:

There are two ways of doing this:

    • Horizontal cut: A table, horizontally cut into multiple tables, each node stores a single child table, while retaining two additional copies. So each child table contains all the columns of the original table.

    • Vertical segmentation: That is, a table vertical segmentation, the original column split, equivalent to the data scattered, you can get super high random query performance.

Horizontal segmentation does not require people to participate, blind segmentation can be.
Vertical segmentation is equivalent to segmentation based on business level, and has some degree of human intervention.

MySQL problem

The above describes the efforts of MySQL in dealing with large-volume concurrency, the underlying architecture is more and more complex, the corresponding, upper-level applications will become more and more complex . Although some companies have developed a middleware layer to shield the complexity of development, they cannot avoid the complexities of the architecture.

The main disadvantage of the previous approach is that:

    • Inflexible: Table structure changes are difficult, the table structure of the relational database is determined at the very beginning and difficult to change.

    • The scalability issue is still unresolved, and MySQL's cluster solution, which only guarantees reliability and cannot guarantee performance.

    • MySQL often needs to hold large text fields, causing the database to recover very slowly.

So the relational database is very powerful, but does not mean that it can handle all the application environment, if we want to ensure good scalability, flexibility, may need to make some trade-offs of the original function.

Below we will introduce the CAP theory, which will tell us from a theoretical point of view that fish and paws cannot be combined

Cap theory

The so-called cap (consistency, availability, Partition tolerance) theory, refers to consistency, performance, scalability can not be fully balanced. That is, if we want good extensibility, we will not be able to guarantee strong consistency like traditional relational data.

In the end to ensure strong consistency , mainly to see the customer's business needs strong consistency?

such as DNS resolution, the replacement of the host is temporarily unable to refresh the entire network, this short-term inconsistency is acceptable.

But it's not completely inconsistent, we have a model called the NWR model , which balances consistency and performance.

    • NRepresents the number of copies of a block of data,

    • WIndicates that several writes have been made to return success.

    • RIndicates that to ensure a consistent reading, the client needs to N read the data from several nodes in the saved copy.

For example, N=3,w=1, which is to save 3 copies, writes 1 copies of data to return success.

    • If R=1 writes a copy of the data and starts reading after it succeeds, because it reads only one copy of the data, it is possible to read from two copies of the data that have never been updated and possibly read from the newly updated data, so it is possible to read the version that is not updated.

    • If r=2, there is no guarantee that you will read the latest updated version.

    • If r=3, you can guarantee that 3 of the data you read is up-to-date, just to see whose timestamp is later.

If the w=2, you must write successfully 2 copies to return to success, read, only need to read 2 copies to ensure the latest.

Summing up, as long as the $w+r>n$, you can ensure that the reading is consistent.

Nosql

Weak consistency of the database cluster generally does not support transactions, but also does not support the associated query , why? Because the associated query requires other nodes to provide data sharding , one query needs to mobilize other nodes, so concurrency performance will be very poor.

This kind of lightweight distributed database system, which can only guarantee weak consistency, is a NoSQL system , which removes the correlation.

NoSQL is the most used when the number of key-value storage, of course, there are other document-type, column storage, graph database, XML database and so on.

NoSQL classification

We have a simple classification of existing nosql.

type section represents features
Column Storage Hbase Easy to do data compression, for a column or a few columns of the query has a very large IO advantage.
Document storage Mongodb Json. Because the stored content is a document, you can index some fields
Key-value Storage Redis/memcache You can quickly query to its value by key
Diagram Storage Neo4j The best way to store a graphical relationship is to use this method.
Object storage Db4o/versant Access the data through the object's way.
XML database Berkeley DB XML Supports internal query syntax for XML, such as XPath.

Advantage
    • More flexible.

      Relational databases need to be pre-set to data fields, and NoSQL storage is a key-value pair, column storage, and more flexible, especially suitable for processing unstructured, semi-structured data.

    • Easy to expand: Remove the relational database relational features, data is not related, easier to expand.

    • High performance: Because NoSQL removes the relational nature, the database structure is simple, the performance is naturally higher.

    • Highly available: Most NoSQL data supports automatic replication , allowing for easy implementation of highly available architectures without compromising performance.

    • Automatic sharding:

      Relational data because it is structured, so generally vertical expansion, it means that a single server needs to hold the entire database, extensibility is not natural.

      NoSQL generally supports automatic sharding , which means that data is automatically distributed to multiple servers, all of which are transparent to the application.

Disadvantages

Knowing the pros, we also need to understand the drawbacks of NoSQL.

NoSQL, with respect to relational databases, abandons many features, such as transactions, consistency, and so on, and is not yet mature enough, and many nosql do not provide SQL support.

So the best way to do this is to understand those scenarios that are appropriate for NoSQL, and then combine with a universal relational database to use

Some k-v-related storage, user password storage, session storage, etc., can use NoSQL

"Big Talk Store II" study notes (Chapter 15), NoSQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.