[case] How to heterogeneous a billions of level database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The data source is MySQL and the target media is elasticsearch.

1. Resources we can use 1.1 source data model

Source Library is someone else (inventory) data, divided into a,b,c three types of inventory model, it is necessary to integrate three types of models into a common stock model to facilitate our business.
A typical Internet enterprise is a collaborative approach through which data replicas enable decoupling between businesses.

1.2 Special tables (non-focus)

D is the inventory of the order details, but also a heterogeneous copy.

1.3 Sub-Library sub-table

ABCD has done a sub-database sub-table, A (16 libraries, 4096 tables), B (1,512), C (1,256), D (8,1024)

1.4 Data Volume

Total data at level billions of

1.5 Impact on the line

Does not affect the other side of the business, the data source only the other MySQL group corresponding to the number of extraction from the library.
MySQL Group explanation

1.6 Performance requirements

In the future to support complex condition query, check performance has a high demand, the target media is es.

2. Difficulty 2.1 derivative

Trading inventory Complex shard rules, large data volume, derivative is a big project.

2.2 Update frequently

Write operations frequently, ES CREATE index TPS can meet the requirements.

2.3 How consistency guarantees

Base final consistency is achieved through MQ.

3, the Final Solution 3.1 system overall architecture

First, the incremental data is collected by canal (Binlake with clustered canal), and the ABC library is stored in es,d inventory into MySQL.

3.2 How to do a full-volume inverted library

The SOP corresponds to type A, which uses multiple topic to disperse the pressure of the message middleware while solving the connection limit of the same topic of the middleware.

3.3 How to Improve ES write performance (bulk)

Create an ES index asynchronously via JMQ, and implement bulk mode submission for transparent applications through Redis queues

How to improve Elasticsearch Performance
How to improve Elasticsearch performance

SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;

2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.

How to Improve ES performance:

1.ES update operation as far as possible to provide batch processing, reduce the number of writes; (bulk)

2. According to the business dimension of multiple sets of clusters, in the cluster and then in accordance with the merchant dimension of the Shard;

3. Hotspot Focus Problem Solution: Routing rules can be customized by the ES team;

4. Provide data reconstruction Reindex function, require mapping configuration source is true; (easy to rebuild data)

5.ES team requirements, business parties need to provide at least 10 physical machines for deployment;

6. The number of deep page bar recommended control within 10,000, do paging with scroll way, prohibit jump to large page

Elasticsearch vs. Solr

Elasticsearch vs. Solr

Both installations are simple;
SOLR uses Zookeeper for distributed management, and Elasticsearch itself with distributed coordination management functions;
SOLR supports more formats of data, while Elasticsearch only supports JSON file formats;
SOLR officially provides more features, while Elasticsearch itself is more focused on core functions, advanced features are provided by third-party plug-ins;
SOLR is better than Elasticsearch in traditional search applications, but the aging rate in real-time search applications is significantly lower than that of Elasticsearch.

SOLR is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.

SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;

2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.

Concurrent number-throughput-response time

Concurrent number-throughput-response time

Used to refer to website performance/server performance: Concurrency: Number of requests processed simultaneously by the system (divided into query class requests, transaction class requests).
Throughput: The number of requests that the system processes within a unit of time. It's just a very broad term,
The units of throughput that people often refer to are: TPS/QPS, pages per second, number of people/days, number of transactions per hour, and so on.
A few related concepts:

TPS, QPS, rpstps:transactions per Second (number of transactions per second),
Refers to the number of transactions processed per second by the server. Generally used to evaluate the benchmark performance of databases and trading systems.

Qps:queries per Second (query amount/sec),
is the number of queries that the server can process per second, such as the domain name server and MySQL query performance.

Rps:request per Second (Requests/sec)
RPS (Request Per Second) and QPS can be considered one thing.

Rt:response Time: The client sends a request to start the time that the client receives to the end of the response returned from the server side,
The response time consists of three parts: request send time, network transfer time and server processing time. Also called think time.

The relationship between concurrency and TPS/QPS: QPS (TPS) = concurrency/Average response time the number of concurrent numbers here is TPS if the number of transaction requests is, or QPS if the number of query requests.

[case] How to heterogeneous a billions of level database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[case] How to heterogeneous a billions of level database

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[case] How to heterogeneous a billions of level database

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support