MySQL as a new NoSQL solution: Easily deal with billions of levels of data

Source: Internet
Author: User
Tags value store

MySQL is now a better nosql solution. We say this because MySQL has the advantage of performance, ease of use, and stability when storing data such as key/value (Key/value). The MySQL engine is robust and well-supported by the community and the official, with a very rich online profile covering a variety of operations, troubleshooting, replication, and various usage patterns. For this reason, MySQL has a big advantage over the emerging NoSQL engine.

In recent years, NoSQL engines have become mainstream. Many developers see the NoSQL engine (including: MongoDB, Cassandra, Redis, and Hadoop) as the best solution, while not endorsing the use of the old SQL engine.

Choosing a NoSQL database is often a matter of speculation, or the fact that relational databases can't solve nosql-related things. General engineers often ignore operational costs, stability, and maturity when choosing a database. For more information on the differences between the various nosql (and SQL) engines, as well as the limitations, please refer to the Jepsen column on Aphyr.

This article explains why using MySQL to store key/value data is more appropriate than most dedicated NoSQL engines, and provides the appropriate guidance steps.

The Wix Web site solution

When a user accesses a site page on WiX, the browser sends an HTTP request to the WiX Web server. Whether it is a custom domain name (for example,  cncounter.com ) is also a free WiX two-level domain name (e.g.  user.wix.com/site ). The server needs to resolve the URL request to the appropriate site by querying the key/value pairs. In the following discussion we treat the URL as a route (route).

routesTables are used to parse URLs into site objects. Because the site may have multiple routes, it is a many-to-one relationship (many to a, N:1). When the site is found, the program loads it. The site object mechanism is complex, including two sub-object lists-different services used by the site. The following is a sample object model, assuming that you use a standard SQL database and a normalized table structure:

When updating the site, if you are using a traditional paradigm model to update multiple tables, you need to use transactions (transaction) to ensure consistency (data consistency). (Note that transactions use a database-level lock (Db-level lock), which prevents concurrent writes and sometimes also affects concurrent read operations on related tables.) Using this model, you might need to set a sequence key (serial key) in each table, use a foreign key, and create an index for the URL field of the routes table.

However, there are many problems with the use of paradigm design:

    • Locks restrict access to tables, so performance is severely impacted in high-throughput scenarios.
    • Reading an object requires using multiple SQL statements (here are 4), or using join (which is affected by latency).
    • The generation of the series Keys uses locks and limits write throughput.

In MySQL or other SQL engines, concurrency and throughput problems occur in a large number of cases. Because of these drawbacks, many developers tend to use NoSQL solutions when storing data in the form of key/value pairs, sacrificing some stability (stability), consistency (consistency), and availability (in order to provide better throughput and concurrency performance) Availability).

    • throughput at 200,000 RPM order of magnitude.
    • routes table is recorded at 100 million orders of magnitude and occupies a space of up to ten GB.
    • sites table records at 100 million orders of magnitude, storage space more than GB.
    • average read latency between 1.0-1.5 milliseconds (in fact, the same data center is 0.2-0.3 milliseconds).

Note that in most key/value engines, latency of around 1.0 milliseconds is excellent, including open source and cloud-based! And we are using MySQL (MySQL is considered the standard SQL engine).

The table structure is as follows:

create TABLE ' routes ' (' route ' varchar (255) NOT NULL, ' site_id ' varchar (a) NOT null, ' last_update_date ' bigint not N ULL, PRIMARY key (' key '), key (' site_id ')) CREATE TABLE ' sites ' (' site_id ' varchar () not NULL, ' owner_id ' varchar (50 ) not NULL, ' schema_version ' varchar (TEN) NOT null DEFAULT ' 1.0 ', ' site_data ' text not null, ' Last_update_date ' bigint N OT NULL, PRIMARY KEY (' site_id '))/*engine=innodb DEFAULT Charset=utf8 row_format=compressed key_block_size=16*/;

All fields that are not required as query criteria have been merged into a single BLOB field (Site_data text field). It contains the Sub-obj record and other domains of the site object itself. Note also that the self-increment sequence key is not used here; instead, we use it varchar(50) to store the GUID value generated by the client.

Here are the query statements we use, with high throughput and low latency features:

select * from sites where site_id = (select site_id from routes where route =?)  

The routes table is first queried through a unique index, which returns only one result. The primary key is then searched for site, and there is only one record. Nested query syntax ensures that 2 SQL queries require only a single database interaction.

The results mentioned earlier, in the case of high traffic and high update rates, maintain the performance on average to about milliseconds. Although transactions are not used, update is a semi-transactional nature (semi-transactional). This is because a site is entered first, but no related records are queried until the route record is entered. So if we first enter site, and then enter the route, we can still ensure the consistency of the data, even if there are many orphan data in the sites table.

MySQL as a NoSQL engine usage guide

Using the example above (and other similar scenarios from WiX), we have crafted a tutorial on MySQL as a nosql engine.

Avoid database lock (DB locks) and complex query .

    • Transactions are not used because they result in locks (locks). Instead, you should use the application-tier transaction (applicative transactions).
    • The sequence key (serial key) is not used. The sequence key causes the lock and causes a complex active-active configuration.
    • Use the unique keys generated by the client (client-generated unique keys). We are using a GUID.

When designing a database, there are also the following points when querying for optimization:

    • don't use paradigms (do not normalize).
    • a single row of data must be allowed to be read.

When querying the data:

    • Use the primary key (primary key) and index as criteria to query.
    • Do not use join.
    • Do not use aggregate functions (aggregation).
    • Perform heavy queries (housekeeping queries) only in replicas (replica), such as Business Intelligence (BI), data exploration, and so on, as far as possible, not on the main library (master database).

We're going to dive into another blog about live migration (live migrations) and Application layer transactions (applicative transactions).

Summarize

The main purpose of this article is to let you know the new features of MySQL. It's great to have MySQL as a nosql engine, although MySQL isn't designed for NoSQL. This article shows how to use MySQL instead of a dedicated NoSQL engine to store key/value (Key/value) information.

In Wix, saving key/value information (in other cases) will select the MySQL engine because it is easy to use and operate, and MySQL is a powerful ecosystem. In addition, MySQL is not inferior on metrics (metrics) such as latency (latency), throughput (throughput), and concurrency (concurrency), compared to most NoSQL engines.

Original link (need to turn over the wall): http://engineering.wix.com/2015/12/10/scaling-to-100m-mysql-is-a-better-nosql/

Original date: December 10, 2015

Translation Date: December 27, 2015

Author: Anchor Http://blog.csdn.net/renfufei



MySQL as a new NoSQL solution: Easily deal with billions of levels of data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.