Recently developed a project. The client submits 100 lines of data to the server every 10 seconds. The Server re-queries and writes the data. The number of clients is about tens of thousands. the submitted data is concentrated, so you do not need to read the data. The current design is: the database is partitioned by client. The data size of each table is not high. The server receives... a recently developed project. The client submits 100 lines of data to the server every 10 seconds. The Server re-queries and writes the data.
The number of clients is about tens of thousands. the submitted data is concentrated, so you do not need to read the data.
The current design is:
The database is partitioned by client. The data size of each table is not high.
After the server obtains the data, it first inserts the redis queue and then inserts the data into the database through the scheduled task.
The problem is:
1. Can the interface provided by the server to the client meet the requirements of thousands of clients to post data simultaneously (the client submits data once every 10 seconds )?
2. save the data first in the redis queue. if there are tens of millions of data, is redis stable?
The basic goal is to ensure that the server can provide services normally.
---------------------- Supplementary content -------------------------------
Projects are mainly used to collect user data. It runs automatically upon startup.
100 data records are submitted each time and once every 10 seconds. Generally, the number of data records is less than 10, that is, 1000 data records per day.
Each data entry contains five or six value pairs, which are less than 100 characters long.
Ensure the integrity of daily data. Multiple clients may collect the same user data, so you need to avoid duplication.
Now we consider the following:
Data tables are partitioned by users.
The data submitted by the user is saved to the redis queue by the user first, that is, each user has a queue every day. after saving the data to the database, the queue is deleted.
Reply content:
Recently developed a project. The client submits 100 lines of data to the server every 10 seconds. The Server re-queries and writes the data.
The number of clients is about tens of thousands. the submitted data is concentrated, so you do not need to read the data.
The current design is:
The database is partitioned by client. The data size of each table is not high.
After the server obtains the data, it first inserts the redis queue and then inserts the data into the database through the scheduled task.
The problem is:
1. Can the interface provided by the server to the client meet the requirements of thousands of clients to post data simultaneously (the client submits data once every 10 seconds )?
2. save the data first in the redis queue. if there are tens of millions of data, is redis stable?
The basic goal is to ensure that the server can provide services normally.
---------------------- Supplementary content -------------------------------
Projects are mainly used to collect user data. It runs automatically upon startup.
100 data records are submitted each time and once every 10 seconds. Generally, the number of data records is less than 10, that is, 1000 data records per day.
Each data entry contains five or six value pairs, which are less than 100 characters long.
Ensure the integrity of daily data. Multiple clients may collect the same user data, so you need to avoid duplication.
Now we consider the following:
Data tables are partitioned by users.
The data submitted by the user is saved to the redis queue by the user first, that is, each user has a queue every day. after saving the data to the database, the queue is deleted.
Merge inserts. do not insert one entry or one entry. for example, merge 1000 inserts for the same entry to reduce interactions.
If this table is just a simple insert and query operation and does not require transaction support, you can consider using the MyISAM engine. compared with InnoDB, you can achieve higher performance during insertion.
First, there are several considerations
Bandwidth sufficient?
The number of CPUs. for example, if there are 4 cores and the number of php-fpm is also 4, the processing time for each request is 50-MS, which is the approximate number of requests processed during the duration.
Memory, memory occupied by a process of 10 to 25 MB.
You can consider load balancing and dns round robin. Pay attention to the high availability of the cluster.
Second, there are also several considerations
What is the length of a data row? Apsaradb for redis may suffer performance degradation for 1 kb or more.
Processing speed: the amount of data accumulated in the queue and the amount of memory occupied
Redis architecture, how to ensure data is not lost, how to make high availability
Whether the current resource allows this scheme and whether there are other schemes.
Cannot write concurrently? The master and master nodes are active and the concurrent write decompression is 50%.
Use MyCat
You can perform Database sharding, consistent hash, or simple id range hash. it should be enough. if you are in trouble, read/write splitting should first look at the load.
Try queue?
The subject said that the data generation is relatively concentrated... you can use the queue task to slightly extend the concentrated task time .... smooth write as much as possible... you need to find a reasonable balance between the write read latency and smooth processing duration .... if there is really no room for concessions, we will actually look at the high-end path mentioned above... in addition, if you don't want to toss the database, you can try writing the dump file first... another supporting import .... I don't know. this is not a wild path ....
-1. it is obviously urgent to submit 100 entries at a time and process them in 10 seconds. I assume that your data is allowed to be partially lost, you can consider caching data on the client (caching data on the client is an adventure). For example, I submit 200 records every 20 seconds.
-2. the server can use task queues to reduce server blocking and improve concurrency. (Submit once every 10 seconds, so high concurrency is very likely to occur)
-3. check whether the data is read and written frequently. Otherwise, we recommend that you have ehcache, and the cluster will incur additional costs at the same time.
-4. do not share servers with other services for such special services.
-5. how to split the table later depends on your business.