[case] How to heterogeneous a billions of level database

Source: Internet
Author: User
Tags create index mysql query solr solr query domain name server website performance

The data source is MySQL and the target media is elasticsearch.

1. Resources we can use 1.1 source data model

Source Library is someone else (inventory) data, divided into a,b,c three types of inventory model, it is necessary to integrate three types of models into a common stock model to facilitate our business.
A typical Internet enterprise is a collaborative approach through which data replicas enable decoupling between businesses.

1.2 Special tables (non-focus)

D is the inventory of the order details, but also a heterogeneous copy.

1.3 Sub-Library sub-table

ABCD has done a sub-database sub-table, A (16 libraries, 4096 tables), B (1,512), C (1,256), D (8,1024)

1.4 Data Volume

Total data at level billions of

1.5 Impact on the line

Does not affect the other side of the business, the data source only the other MySQL group corresponding to the number of extraction from the library.
MySQL Group explanation

1.6 Performance requirements

In the future to support complex condition query, check performance has a high demand, the target media is es.

2. Difficulty 2.1 derivative

Trading inventory Complex shard rules, large data volume, derivative is a big project.

2.2 Update frequently

Write operations frequently, ES CREATE index TPS can meet the requirements.

2.3 How consistency guarantees

Base final consistency is achieved through MQ.

3, the Final Solution 3.1 system overall architecture

First, the incremental data is collected by canal (Binlake with clustered canal), and the ABC library is stored in es,d inventory into MySQL.

3.2 How to do a full-volume inverted library

The SOP corresponds to type A, which uses multiple topic to disperse the pressure of the message middleware while solving the connection limit of the same topic of the middleware.

3.3 How to Improve ES write performance (bulk)

Create an ES index asynchronously via JMQ, and implement bulk mode submission for transparent applications through Redis queues

How to improve Elasticsearch Performance

How to improve Elasticsearch performance


  1. SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;

2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.

  • How to Improve ES performance:
  • 1.ES update operation as far as possible to provide batch processing, reduce the number of writes; (bulk)
  • 2. According to the business dimension of multiple sets of clusters, in the cluster and then in accordance with the merchant dimension of the Shard;
  • 3. Hotspot Focus Problem Solution: Routing rules can be customized by the ES team;
  • 4. Provide data reconstruction Reindex function, require mapping configuration source is true; (easy to rebuild data)
  • 5.ES team requirements, business parties need to provide at least 10 physical machines for deployment;
  • 6. The number of deep page bar recommended control within 10,000, do paging with scroll way, prohibit jump to large page
Elasticsearch vs. Solr

Elasticsearch vs. Solr

Both installations are simple;
SOLR uses Zookeeper for distributed management, and Elasticsearch itself with distributed coordination management functions;
SOLR supports more formats of data, while Elasticsearch only supports JSON file formats;
SOLR officially provides more features, while Elasticsearch itself is more focused on core functions, advanced features are provided by third-party plug-ins;
SOLR is better than Elasticsearch in traditional search applications, but the aging rate in real-time search applications is significantly lower than that of Elasticsearch.

SOLR is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.

    1. SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;

2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.

Concurrent number-throughput-response time

Concurrent number-throughput-response time

Used to refer to website performance/server performance: Concurrency: Number of requests processed simultaneously by the system (divided into query class requests, transaction class requests).
Throughput: The number of requests that the system processes within a unit of time. It's just a very broad term,
The units of throughput that people often refer to are: TPS/QPS, pages per second, number of people/days, number of transactions per hour, and so on.
A few related concepts:

TPS, QPS, rpstps:transactions per Second (number of transactions per second),
Refers to the number of transactions processed per second by the server. Generally used to evaluate the benchmark performance of databases and trading systems.

Qps:queries per Second (query amount/sec),
is the number of queries that the server can process per second, such as the domain name server and MySQL query performance.

Rps:request per Second (Requests/sec)
RPS (Request Per Second) and QPS can be considered one thing.

Rt:response Time: The client sends a request to start the time that the client receives to the end of the response returned from the server side,
The response time consists of three parts: request send time, network transfer time and server processing time. Also called think time.

The relationship between concurrency and TPS/QPS: QPS (TPS) = concurrency/Average response time the number of concurrent numbers here is TPS if the number of transaction requests is, or QPS if the number of query requests.

[case] How to heterogeneous a billions of level database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.