The data source is MySQL and the target media is elasticsearch.
1. Resources we can use 1.1 source data model
Source Library is someone else (inventory) data, divided into a,b,c three types of inventory model, it is necessary to integrate three types of models into a common stock model to facilitate our business.
A typical Internet enterprise is a collaborative approach through which data replicas enable decoupling between businesses.
1.2 Special tables (non-focus)
D is the inventory of the order details, but also a heterogeneous copy.
1.3 Sub-Library sub-table
ABCD has done a sub-database sub-table, A (16 libraries, 4096 tables), B (1,512), C (1,256), D (8,1024)
1.4 Data Volume
Total data at level billions of
1.5 Impact on the line
Does not affect the other side of the business, the data source only the other MySQL group corresponding to the number of extraction from the library.
MySQL Group explanation
1.6 Performance requirements
In the future to support complex condition query, check performance has a high demand, the target media is es.
2. Difficulty 2.1 derivative
Trading inventory Complex shard rules, large data volume, derivative is a big project.
2.2 Update frequently
Write operations frequently, ES CREATE index TPS can meet the requirements.
2.3 How consistency guarantees
Base final consistency is achieved through MQ.
3, the Final Solution 3.1 system overall architecture
First, the incremental data is collected by canal (Binlake with clustered canal), and the ABC library is stored in es,d inventory into MySQL.
3.2 How to do a full-volume inverted library
The SOP corresponds to type A, which uses multiple topic to disperse the pressure of the message middleware while solving the connection limit of the same topic of the middleware.
3.3 How to Improve ES write performance (bulk)
Create an ES index asynchronously via JMQ, and implement bulk mode submission for transparent applications through Redis queues
How to improve Elasticsearch Performance
How to improve Elasticsearch performance
- SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;
2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.
- How to Improve ES performance:
- 1.ES update operation as far as possible to provide batch processing, reduce the number of writes; (bulk)
- 2. According to the business dimension of multiple sets of clusters, in the cluster and then in accordance with the merchant dimension of the Shard;
- 3. Hotspot Focus Problem Solution: Routing rules can be customized by the ES team;
- 4. Provide data reconstruction Reindex function, require mapping configuration source is true; (easy to rebuild data)
- 5.ES team requirements, business parties need to provide at least 10 physical machines for deployment;
- 6. The number of deep page bar recommended control within 10,000, do paging with scroll way, prohibit jump to large page
Elasticsearch vs. Solr
Elasticsearch vs. Solr
Both installations are simple;
SOLR uses Zookeeper for distributed management, and Elasticsearch itself with distributed coordination management functions;
SOLR supports more formats of data, while Elasticsearch only supports JSON file formats;
SOLR officially provides more features, while Elasticsearch itself is more focused on core functions, advanced features are provided by third-party plug-ins;
SOLR is better than Elasticsearch in traditional search applications, but the aging rate in real-time search applications is significantly lower than that of Elasticsearch.
SOLR is a powerful solution for traditional search applications, but Elasticsearch is more suitable for emerging real-time search applications.
- SOLR query fast, but the update index is slow (that is, insert delete slow), for e-commerce and other queries for many applications;
2.ES index Fast (that is, query slow), that is, real-time query fast, for Facebook Sina and other searches.
Concurrent number-throughput-response time
Concurrent number-throughput-response time
Used to refer to website performance/server performance: Concurrency: Number of requests processed simultaneously by the system (divided into query class requests, transaction class requests).
Throughput: The number of requests that the system processes within a unit of time. It's just a very broad term,
The units of throughput that people often refer to are: TPS/QPS, pages per second, number of people/days, number of transactions per hour, and so on.
A few related concepts:
TPS, QPS, rpstps:transactions per Second (number of transactions per second),
Refers to the number of transactions processed per second by the server. Generally used to evaluate the benchmark performance of databases and trading systems.
Qps:queries per Second (query amount/sec),
is the number of queries that the server can process per second, such as the domain name server and MySQL query performance.
Rps:request per Second (Requests/sec)
RPS (Request Per Second) and QPS can be considered one thing.
Rt:response Time: The client sends a request to start the time that the client receives to the end of the response returned from the server side,
The response time consists of three parts: request send time, network transfer time and server processing time. Also called think time.
The relationship between concurrency and TPS/QPS: QPS (TPS) = concurrency/Average response time the number of concurrent numbers here is TPS if the number of transaction requests is, or QPS if the number of query requests.
[case] How to heterogeneous a billions of level database