I have a PHP that schedules tasks to execute once a day, with the following logic:
1. Connect to the database, select from the database to read the relevant data to the array
2. Cycle through the amount of data received, with 3 MySQL operations in the middle of the loop (SELECT, INSERT, update each)
3. Close the database connection after the cycle is complete
The number of cycles performed is based on the mysql_num_rows of Step 1, which is basically thousands of million.
Then in the cycle of a short period of time will be executed continuously thousands of tens of thousands of X3 database operations, the efficiency is very low. And because the number of cycles of multi-tasking will take a long time to execute, it will cause Nginx to appear 504 errors.
and frequent database operations and long connections, consuming too much resources, resulting in inefficient overall environment.
How can I optimize it?
Please, thank you for your great God.
Reply content:
I have a PHP that schedules tasks to execute once a day, with the following logic:
1. Connect to the database, select from the database to read the relevant data to the array
2. Cycle through the amount of data received, with 3 MySQL operations in the middle of the loop (SELECT, INSERT, update each)
3. Close the database connection after the cycle is complete
The number of cycles performed is based on the mysql_num_rows of Step 1, which is basically thousands of million.
Then in the cycle of a short period of time will be executed continuously thousands of tens of thousands of X3 database operations, the efficiency is very low. And because the number of cycles of multi-tasking will take a long time to execute, it will cause Nginx to appear 504 errors.
and frequent database operations and long connections, consuming too much resources, resulting in inefficient overall environment.
How can I optimize it?
Please, thank you for your great God.
As you said this situation, it is recommended not to use the method of the request, with the crontab task in the background to run PHP script, database query batch processing, such as a total of 100,000, each time 1000; if it must be dealt with one by one and not fast, it is recommended to Fetch_row side , avoid putting the array back into the loop first. Remember the time-out for Set_time_limit and database connections based on the performance.
A little bit of the long-time task of this kind of data to say a few points my opinion:
1, the Web environment is not suitable for long-time tasks: NGINX+PHP-FPM This architecture is not suitable for the CEO when the task, the middle of various timeouts can torture the dead, apache+php better at least a good control timeout, a simple set_time_limit (0) can be done.
2, task scheduling through the web to achieve: Most of the PHP framework for the command line support is not good, or the implementation of the time does not consider the command line support, so a Web-based task distribution mechanism is easy to implement some, so that the existing framework is much less intrusive, and a stable project, It is extremely important to ensure the unification of the entrance. If the task runs under the command line need to consider a lot of problems, the most prominent problem is the file permissions problem, the general Web project is the Apache and other users run, the resulting file owner is Apache, and Apache is generally not allowed to login, Although it can be implemented with Apache users running commands, but more complex.
3, Divide and conquer: a solution to the long-term task is to divide and conquer, the big task into small tasks, the long-term task into a number of short-term tasks, reduce the time spent on resources, reduce the long-term execution caused by various problems, such as database connection timeout, PHP memory leaks and other issues.
With an example I wrote, please give me a lot of advice.
Https://github.com/zkc226/cur ...
A large amount of data is handed over to the task system for execution. First, a request is initiated by the message producer to hand over the request to the consumer and returns to avoid waiting for a timeout to occur. Consumers perform multithreading. It is recommended to use Gearman, which is convenient to use and supports PHP interface. Other similar workman,swoole can be achieved.
All operations are centralized on the same server at the same point in time execution, which is certainly time-consuming and resource-intensive.
Or as @ Huanghong said, batches of processing.
Either increase the server, distribute these tasks to other servers, so-called distributed processing, but also increase the complexity of the task, as well as ensure the consistency of the data
1. Export the data to a file and read the file to loop. (e.g. mysqldump)
2. Consider whether the statement can be spelled first, batch execution. Not every loop is executed.
3. Consider whether you can use stored procedures
And because the number of cycles of multi-tasking will take a long time to execute, it will cause Nginx to appear 504 errors.
Is it real-time computing? Whether the task of calculating large amount of tasks considers the background running task calculates good write cache and requests the read-only cache in real time.
The question is a little bit like the one I was answering earlier. Parallel execution improves efficiency
The essence is to read this big data for the shunt, by ID for modulo parallel execution, such as your server and database can withstand 20 concurrent execution
The simplest way to do that is to open 20 script process Execution
0.php-select * from Test where id%20=0;
1.php-select * from Test where id%20=1;
2.php-select * from Test where id%20=2;
....
This is the way to pull.
There is also a way to push to the queue, the queue and then call the Woker process to execute, so that more standardized management, such as the upstairs there is a mention of the Gearman, I used to do the SMS platform when there is a daily scheduled task, is the use of this.
The logic is that you open a timed task. The script takes all the data from the query to the Gearman scheduler by calling the Gearman client, then you open 20 Woker (different servers that can be on the same server or on the LAN). Then the scheduler will be assigned to these 20 Gearman Woker scripts to execute, like each worker script code, is a piece of data execution of a task
PHP script processing in CLI mode, do not use the web, it is easy to time out