PHP batch processing implementation _ PHP Tutorial

Source: Internet
Author: User
Tags ibm developerworks
What if a feature in a web application takes more than 1 second or 2 seconds to complete? Some offline processing solution is needed. Learn several ways to provide offline services for long-running jobs in PHP applications.
There is a big problem with large chain stores. Every day, thousands of transactions occur in each store. Company executives hope to mine these data. Which products are selling well? Which ones are bad? Where do organic products sell well? How is the sales of ice cream?

In order to capture this data, the organization must load all transactional data into a data model in order to be more suitable for generating the type of report required by the company. However, this is time-consuming, and as the size of the chain grows, processing one day of data may take more than a day. Therefore, this is a big problem.

Now, your Web application may not need to process so much data, but the processing time of any site may exceed the time customers are willing to wait. Generally speaking, the time that the customer is willing to wait is 200 milliseconds. If this time is exceeded, the customer will feel the process is "slow". This number is based on desktop applications, and the Web makes us more patient. However, customers should not wait more than a few seconds. Therefore, there are some strategies to deal with batch jobs in PHP.

Decentralized approach with cron

On UNIX® machines, the core program that performs batch processing is the cron daemon. The daemon reads a configuration file, which tells it which command lines to run and how often. Then, the daemon executes them according to the configuration. When an error is encountered, it can even send error output to the specified email address to help debug the problem.

I know some engineers strongly advocate the use of threading technology. "Threads! Threads are the real method of background processing. The cron daemon is too outdated."

I do not think so.

I have used both methods. I think cron has the advantages of the "Keep It Simple, Stupid (KISS, simple is beautiful)" principle. It keeps background processing simple. There is no need to write a multi-threaded job processing application that runs all the time (so there will be no memory leaks), but cron starts a simple batch script. This script determines whether there is a job to process, executes the job, and then exits. No need to worry about memory leaks. There is no need to worry about the thread stopping or falling into an infinite loop.

So, how does cron work? This depends on the system environment you are in. I only discuss the UNIX command-line version of the old-fashioned simple cron. You can ask your system administrator how to implement it in your own Web application.

Here is a simple cron configuration that runs a PHP script at 11pm every day:

0 23 * * * jack / usr / bin / php /users/home/jack/myscript.php


The first 5 fields define when the script should be started. Then the username that should be used to run this script. The remaining commands are the command lines to be executed. The time fields are minute, hour, day of the month, month, and day of the week. Here are a few examples.

Command:

15 * * * * jack / usr / bin / php /users/home/jack/myscript.php


Run the script at the 15th minute of each hour.

command:

15,45 * * * * jack / usr / bin / php /users/home/jack/myscript.php


Run the script on the 15th and 45th minutes of each hour.

command:

* / 1 3-23 * * * jack / usr / bin / php /users/home/jack/myscript.php


Run the script every minute between 3am and 11pm.

command

30 23 * * 6 jack / usr / bin / php /users/home/jack/myscript.php


Run the script at 11:30 every Saturday night (specified by 6 on Saturday).

As you can see, the number of combinations is unlimited. You can control the time to run the script as needed. You can also specify multiple scripts to run, so that some scripts can be run every minute, while other scripts (such as backup scripts) can only be run once a day.

To specify the email address to which the reported error is sent, you can use the MAILTO command as follows:

MAILTO=jherr@pobox.com


Note: For Microsoft® Windows® users, there is an equivalent Scheduled Tasks system that can be used to periodically start command-line processes (such as PHP scripts).

 Back to top


Basic knowledge of batch processing architecture

Batch processing is quite simple. In most cases, one of two workflows is used. The first workflow is for reporting; the script runs once a day, it generates the report and sends the report to a group of users. The second workflow is a batch job created in response to a certain request. For example, I log into the Web application and ask it to send a message to all users registered in the system, telling them a new feature. This operation must be batched because there are 10,000 users in the system. It takes a while for PHP to complete such a task, so it must be performed by a job outside the browser.

In the second workflow, the web application only needs to put the information somewhere and let the batch application share it. This information specifies the nature of the job (for example, "Send this e-mail to all the people on the system".) The batch program runs this job and then deletes the job. Alternatively, the handler marks the job as completed. Either way, the job should be recognized as completed so that it will not be run again.

The rest of this article demonstrates various methods of sharing data between the front end of a web application and the back end of a batch.

 Back to top


Mail queue

The first method is to use a dedicated mail queue system. In this model, a table in the database contains email messages that should be sent to individual users. The web interface uses the mailouts class to add e-mail to the queue. The email handler uses the mailouts class to retrieve unprocessed emails, and then uses it again to delete unprocessed emails from the queue.

This model requires MySQL mode first.

Listing 1. mailout.sql
    DROP TABLE IF EXISTS mailouts; CREATE TABLE mailouts (id MEDIUMINT NOT NULL AUTO_INCREMENT, from_address TEXT NOT NULL, to_address TEXT NOT NULL, subject TEXT NOT NULL, content TEXT NOT NULL, PRIMARY KEY (id));


This model is very simple. Each line has a from and a to address, as well as the subject and content of the email.

The PHP mailouts class handles the mailouts table in the database.

Listing 2. mailouts.php
    <? phprequire_once ('DB.php'); class Mailouts {public static function get_db () {$ dsn = 'mysql: // root: @ localhost / mailout'; $ db = & DB :: Connect ($ dsn, array ()); if (PEAR :: isError ($ db)) {die ($ db-> getMessage ());} return $ db;} public static function delete ($ id) {$ db = Mailouts :: get_db ( ); $ sth = $ db-> prepare ('DELETE FROM mailouts WHERE id =?'); $ db-> execute ($ sth, $ id); return true;} public static function add ($ from, $ to, $ subject, $ content) {$ db = Mailouts :: get_db (); $ sth = $ db-> prepare ('INSERT INTO mailouts VALUES (null,?,?,?,?)'); $ db-> execute ($ sth, array ($ from, $ to, $ subject, $ content)); return true;} public static function get_all () {$ db = Mailouts :: get_db (); $ res = $ db-> query ( "SELECT * FROM mailouts"); $ rows = array (); while ($ res-> fetchInto ($ row)) {$ rows [] = $ row;} return $ rows;}}?>


This script contains the Pear :: DB database access class. Then define the mailouts class, which contains three main static functions: add, delete, and get_all. The add () method adds an email to the queue. This method is used by the front end. The get_all () method returns all data from the table. The delete () method deletes an email.

You may ask, why don't I just call the delete_all () method at the end of the script. There are two reasons for not doing this: if you delete it after each message is sent, then even if the script is re-run after a problem, the message cannot be sent twice; a new News.

The next step is to write a simple test script that adds an entry to the queue.

Listing 3. mailout_test_add.php
    <? phprequire 'mailout.php'; Mailouts :: add ('donotreply@mydomain.com', 'molly@nocompany.com.org', 'Test Subject', 'This is a test of the batch mail sendout'); ?>


In this example, I add a mailout, this message is sent to a company Molly, which includes the subject "Test Subject" and the email body. You can run this script on the command line: php mailout_test_add.php.

In order to send e-mail, another script is needed, which acts as a job handler.

Listing 4. mailout_send.php
    <? phprequire_once 'mailout.php'; function process ($ from, $ to, $ subject, $ email) {mail ($ to, $ subject, $ email, “From: $ from”);} $ messages = Mailouts: : get_all (); foreach ($ messages as $ msg) {process ($ msg [1], $ msg [2], $ msg [3], $ msg [4]); Mailouts :: delete ($ msg [0 ]);}?>


This script uses the get_all () method to retrieve all email messages, and then uses PHP's mail () method to send the messages one by one. After each successful email is sent, the delete () method is called to delete the corresponding record from the queue.

Use the cron daemon to run this script periodically. The frequency of running this script depends on the needs of your application.

Note: The PHP Extension and Application Repository (PEAR) repository contains an excellent implementation of the mail queue system, which can be downloaded for free.

 Back to top


A more general approach

The solution dedicated to sending emails is great, but is there a more general method? We need to be able to send emails, generate reports, or perform other time-consuming processing without having to wait for the processing to complete in the browser.

To this end, you can use the fact that PHP is an interpreted language. You can store PHP code in a queue in the database and execute it later. This requires two tables, see Listing 5.

Listing 5. generic.sql
    DROP TABLE IF EXISTS processin
g_items; CREATE TABLE processing_items (id MEDIUMINT NOT NULL AUTO_INCREMENT, function TEXT NOT NULL, PRIMARY KEY (id)); DROP TABLE IF EXISTS processing_args; CREATE TABLE processing_args (id MEDIUMINT NOT NULL AUTO_INCREMENT, item_id MEDIUMINT NOT NULL, key_name TEXT value TEXT NOT NULL, PRIMARY KEY (id));


The first table, processing_items, contains functions called by job handlers. The second table, processing_args, contains the parameters to be sent to the function in the form of a hash table composed of key / value pairs.

Like the mailouts table, these two tables are also wrapped by the PHP class, which is called ProcessingItems.

Listing 6. generic.php
    <? phprequire_once ('DB.php'); class ProcessingItems {public static function get_db () {…} public static function delete ($ id) {$ db = ProcessingItems :: get_db (); $ sth = $ db-> prepare ('DELETE FROM processing_args WHERE item_id =?'); $ Db-> execute ($ sth, $ id); $ sth = $ db-> prepare ('DELETE FROM processing_items WHERE id =?'); $ Db-> execute ($ sth, $ id); return true;} public static function add ($ function, $ args) {$ db = ProcessingItems :: get_db (); $ sth = $ db-> prepare ('INSERT INTO processing_items VALUES (null ,?) '); $ db-> execute ($ sth, array ($ function)); $ res = $ db-> query (“SELECT last_insert_id ()”); $ id = null; while ($ res-> fetchInto ($ row)) {$ id = $ row [0];} foreach ($ args as $ key => $ value) {$ sth = $ db-> prepare ('INSERT INTO processing_args VALUES (null,?,? ,?) '); $ db-> execute ($ sth, array ($ id, $ key, $ value));} return true;} public static function get_all () {$ db = Process ingItems :: get_db (); $ res = $ db-> query (“SELECT * FROM processing_items”); $ rows = array (); while ($ res-> fetchInto ($ row)) {$ item = array () ; $ item ['id'] = $ row [0]; $ item ['function'] = $ row [1]; $ item ['args'] = array (); $ ares = $ db-> query ( "SELECT key_name, value FROM processing_args WHERE item_id =?", $ Item ['id']); while ($ ares-> fetchInto ($ arow)) $ item ['args'] [$ arow [0]] = $ arow [1]; $ rows [] = $ item;} return $ rows;}}?>


This class contains three important methods: add (), get_all () and delete (). Like the mailouts system, the front end uses add (), and the processing engine uses get_all () and delete ().

The test script shown in Listing 7 adds an entry to the processing queue.

Listing 7. generic_test_add.php
    <? phprequire_once ‘generic.php’; ProcessingItems :: add (‘printvalue’, array (‘value’ => ‘foo’));?>


In this example, a call to the printvalue function is added, and the value parameter is set to foo. I use a PHP command line interpreter to run this script and put this method call in the queue. Then use the following processing script to run this method.

Listing 8. generic_process.php
    <? phprequire_once 'generic.php'; function printvalue ($ args) {echo 'Printing:'. $ args ['value']. ”\ n”;} foreach (ProcessingItems :: get_all () as $ item) {call_user_func_array ($ item ['function'], array ($ item ['args'])); ProcessingItems :: delete ($ item ['id']);}?>


This script is very simple. It gets the processing entry returned by get_all (), and then uses call_user_func_array (a PHP internal function) to dynamically call this method with the given parameters. In this example, the local printvalue function is called.

To demonstrate this feature, let's see what happens on the command line:

% php generic_test_add.php% php generic_process.php Printing: foo%


Not much output, but you can see the main points. Through this mechanism, the processing of any PHP function can be postponed.

Now, if you do n’t like putting PHP function names and parameters in the database, then another way is to create a mapping between the "processing job type" name in the database and the actual PHP processing function in the PHP code. In this way, if you decide to modify the PHP backend later, as long as the "processing job type" string matches, the system will still work.

 Back to top


Abandon the database

Finally, I demonstrate another slightly different solution that uses files in a directory to store batch jobs instead of using a database. The idea provided here is not to suggest that you "adopt this method instead of using a database". This is just an alternative method, and it is up to you to decide whether to adopt it.

Obviously, there is no schema in this solution because we do not use a database. So first write a class that contains add (), get_all (), and delete () methods similar to the previous example.

Listing 9. batch_by_file.php
    <? phpdefine ('BATCH_DIRECTORY', 'batch_items /'); class BatchFiles {public static function delete ($ id) {unlink ($ id); return true;} public static function add ($ function, $ args) {$ path = ”; While (true) {$ path = BATCH_DIRECTORY.time (); if (file_exists ($ path) == false) break;} $ fh = fopen ($ path,“ w ”); fprintf ($ fh, $ function. ”\ n”); foreach ($ args as $ k => $ v) {fprintf ($ fh, $ k. ”:”. $ v. ”\ n”);} fclose ($ fh); return true;} public static function get_all () {$ rows = array (); if (is_dir (BATCH_DIRECTORY)) {if ($ dh = opendir (BATCH_DIRECTORY)) {while (($ file = readdir ($ dh))! = = false) {$ path = BATCH_DIRECTORY. $ file; if (is_dir ($ path) == false) {$ item = array (); $ item ['id'] = $ path; $ fh = fopen ($ path, 'r'); if ( $ fh) {$ item ['function'] = trim (fgets ($ fh)); $ item ['args'] = array (); while (($ line = fgets ($ fh))! = null) { $ args = split (':', trim ($ line)); $ item ['args'] [$ args [0]] = $ args [1];} $ rows [] = $ item; fclose ($ fh );}}} closedir ($ dh);}} return $ rows;}}?>


The BatchFiles class has three main methods: add (), get_all (), and delete (). This class does not access the database, but reads and writes files in the batch_items directory.

Use the following test code to add a new batch entry.

Listing 10. batch_by_file_test_add.php
    <? phprequire_once ‘batch_by_file.php’; BatchFiles :: add (
"Printvalue", array (‘value’ => ‘foo’));?>


One thing to note: apart from the class names (BatchFiles), there is actually no indication of how the job is stored. Therefore, it is easy to change it to a database-style storage method in the future without modifying the interface.

Finally, the code of the handler.

Listing 11. batch_by_file_processor.php
    <? phprequire_once 'batch_by_file.php'; function printvalue ($ args) {echo 'Printing:'. $ args ['value']. ”\ n”;} foreach (BatchFiles :: get_all () as $ item) {call_user_func_array ($ item ['function'], array ($ item ['args'])); BatchFiles :: delete ($ item ['id']);}?>


This code is almost identical to the database version, only the file name and class name are modified.

 Back to top


Conclusion

As mentioned earlier, the server provides a lot of support for threads and can perform background batch processing. In some cases, it is certainly easier to use auxiliary threads to handle small jobs. However, you can also use traditional tools (cron, MySQL, standard object-oriented PHP, and Pear :: DB) to create batch jobs in PHP applications, which are easy to implement, deploy, and maintain.

References

Learn

You can refer to the original English text of this article on the developerWorks global site.

Learn more about PHP by reading IBM developerWorks' PHP project resource center.

PHP.net is an excellent resource for PHP developers.

The PEAR Mail_Queue package is a robust mail queue implementation that includes the database backend.

The crontab manual provides details of cron configuration, but it is not easy to understand.

The section on Using PHP from the command line in the PHP manual can help you understand how to run scripts from cron.

Follow developerWorks technical events and webcasts at any time.

Learn about upcoming conferences, exhibitions, webcasts, and other events around the world, and IBM open source developers can use these events to learn about the latest technological developments.

Visit the developerWorks open source technology zone for extensive how-to information, tools, and project updates that can help you develop with open source technology and use it with IBM products.

developerWorks podcasts include many interesting interviews and discussions for software developers.

Get products and technologies

Check out PEAR — PHP Extension and Application Repository, which includes Pear :: DB.

Improve your next open source development project with IBM trial software, which can be downloaded or obtained on DVD.

discuss

The developerWorks PHP Developer Forum provides a place for all PHP developers to discuss technical issues. If you have questions about PHP scripts, functions, syntax, variables, debugging, and other topics, you can ask them here.

Join the developerWorks community by participating in developerWorks blogs.

About the author


  Jack D. Herrington is a senior software engineer with more than 20 years of work experience. He has authored three books: Code Generation in Action, Podcasting Hacks and PHP Hacks, and has written more than 30 articles.

Alibaba Cloud Hot Products

Elastic Compute Service (ECS) Dedicated Host (DDH) ApsaraDB RDS for MySQL (RDS) ApsaraDB for PolarDB(PolarDB) AnalyticDB for PostgreSQL (ADB for PG)
AnalyticDB for MySQL(ADB for MySQL) Data Transmission Service (DTS) Server Load Balancer (SLB) Global Accelerator (GA) Cloud Enterprise Network (CEN)
Object Storage Service (OSS) Content Delivery Network (CDN) Short Message Service (SMS) Container Service for Kubernetes (ACK) Data Lake Analytics (DLA)

ApsaraDB for Redis (Redis)

ApsaraDB for MongoDB (MongoDB) NAT Gateway VPN Gateway Cloud Firewall
Anti-DDoS Web Application Firewall (WAF) Log Service DataWorks MaxCompute
Elastic MapReduce (EMR) Elasticsearch

Alibaba Cloud Free Trail

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.