I have a php job that is scheduled to run once a day. The logic is as follows: 1. connect to the database and read the relevant data from the database Select to the array 2. loop is performed based on the number of data obtained. The loop contains three mysql operations (select, insert, and update. close data after loop... I have a php program that executes tasks once a day. The logic is as follows:
1. connect to the database and read relevant data from the database Select to the array
2. perform a loop based on the number of data obtained. The loop contains three mysql operations (select, insert, and update)
3. close the database connection after completing the loop
The number of cycles depends on the number of mysql_num_rows in step 1, which is tens of thousands.
As a result, thousands of database operations will be performed continuously for thousands or three times in a short period of time, which is very inefficient. In addition, because the number of cycles and multi-task operations takes a long time to complete, nginx may encounter a 504 error.
Frequent database operations and persistent connections consume too much resources, resulting in low environmental efficiency.
How can I optimize it?
Thank you!
Reply content:
I have a php program that executes tasks once a day. The logic is as follows:
1. connect to the database and read relevant data from the database Select to the array
2. perform a loop based on the number of data obtained. The loop contains three mysql operations (select, insert, and update)
3. close the database connection after completing the loop
The number of cycles depends on the number of mysql_num_rows in step 1, which is tens of thousands.
As a result, thousands of database operations will be performed continuously for thousands or three times in a short period of time, which is very inefficient. In addition, because the number of cycles and multi-task operations takes a long time to complete, nginx may encounter a 504 error.
Frequent database operations and persistent connections consume too much resources, resulting in low environmental efficiency.
How can I optimize it?
Thank you!
In this case, we recommend that you do not solve the problem by using a request. use crontab to add scheduled tasks to run php scripts in the background. batch processing is performed when the database is queried, for example, a total of 100000 entries, 1000 entries each time. if you must process them one by one and the speed is not very fast, we recommend that you use fetch_row to process them one by one, so that you do not need to recycle them in array first. Remember the time-out between set_time_limit and database connection based on execution conditions.
My views are as follows:
1. the web environment is not suitable for long-time tasks: the nginx + php-fpm architecture is not suitable for long-time tasks, and various timeouts in the middle can be tortured to death, apache + PHP should have at least better control timeout, which can be done simply by setting set_time_limit (0.
2. task scheduling is implemented through web: most PHP frameworks do not support command lines well, or support for command lines is not considered during implementation, therefore, a web-based task distribution mechanism will be easy to implement, which will be much less invasive to the existing framework. In addition, it is extremely important to ensure a stable project and unified entrance. If a task runs under a command line, you need to consider many issues. The most prominent problem is the File permission problem. generally, web projects run with users like apache, the generated file owner is also apache, and apache is generally not allowed to log on. although apache users can run commands, it is complicated.
3. Divide and conquer: a solution for processing long-time tasks is to divide large tasks into small tasks, convert long-time tasks into multiple short-time small tasks, and reduce resource occupation time, reduce various problems caused by long execution, such as database connection timeout and PHP memory leakage.
I have provided you with an example.
Https://github.com/zkc226/cur...
When there is a large amount of data, it is handed over to the task system for execution. First, initiate a request, and the message producer will send the request to the consumer for processing, and return it to avoid waiting for timeout. The consumer executes multi-thread processing. It is recommended to use Gearman, which is convenient to use and supports PHP interfaces. Other functions such as Workman and Swoole can be implemented.
All operations are performed at the same time point on the same server, which is time-consuming and resource-consuming.
Either, as @ Huang Hong said, batch processing.
You can either add servers and distribute these tasks to other servers for execution. distributed processing increases the complexity of tasks because data consistency must be ensured.
1. export data to a file and read the file in a loop. (For example, mysqldump)
2. check whether statements can be combined for batch execution. Instead of executing every loop.
3. check whether stored procedures can be used.
In addition, because the number of cycles and multi-task operations takes a long time to complete, nginx may encounter a 504 error.
Is it real-time computing? Whether a task with a large computing volume takes into account whether the backend task is used to calculate the write cache and read-only cache for real-time requests.
This question is a bit like the parallel execution that I previously answered to improve efficiency.
The essence is to distribute the read big data and execute the modulo execution by ID in parallel. for example, your server and database can withstand 20 concurrent executions.
The simplest way of parallel execution is to run 20 script processes.
0. php-> select * from test where id % 20 = 0;
1. php-> select * from test where id % 20 = 1;
2. php-> select * from test where id % 20 = 2;
....
This is the pull method.
Another way is to push to the queue, and then the queue calls the woker process for execution, which improves the standardization and management. for example, there is a gearman mentioned above, I used daily scheduled tasks when I was working on the SMS platform.
The logic is that you open a scheduled task script to send all the queried data to the gearman scheduler by calling the gearman client, then you open 20 wokers (which can be on the same server or different servers in the LAN), and then the scheduler will allocate them to the 20 gearman woker scripts for execution, the same worker script code is used to execute a single data task.
Use the PHP script in cli mode. do not use the WEB mode and it is easy to time out.