Implementation of batch processing in PHP

Source: Internet
Author: User
Tags ibm developerworks

What should I do if a feature in a Web application takes more than one or two seconds to complete? Some offline processing solution is required. Learn several offline service methods for PHP applications that run for a long time.
There is a big problem with large chain stores. Each day, thousands of transactions occur in each store. The company's executives want to mine the data. Which products are selling well? What's worse? Where Do organic products sell well? How is ice cream sold?

To capture this data, the Organization must load all transactional data into a data model to make it more suitable for generating the types of reports required by the Company. However, this takes a long time, and as the chain grows, it may take more than one day to process data for one day. Therefore, this is a big problem.

At present, your Web application may not need to process so much data, but the processing time of any site may exceed the time the customer is willing to wait. Generally, the customer is willing to wait for 200 ms. If the time exceeds this period, the customer will feel the process is "slow ". This number is based on desktop applications, and Web makes us more patient. However, in any case, the customer should not wait for more than a few seconds. Therefore, some policies should be used to process Batch jobs in PHP.

Distributed mode and cron

On UNIX machines, the core program for batch processing is the cron daemon. This daemon reads a configuration file, which tells it which command lines to run and how often to run them. Then, the daemon executes them according to the configuration. When an error occurs, it can even send an error output to the specified email address to help debug the problem.

I know some engineers strongly advocate thread technology. "Thread! The thread is the real method for background processing. The cron daemon is too outdated ."

I don't think so.

I have used both methods. I think cron has the advantages of "Keep It Simple, Stupid (KISS, Simple is beautiful. It makes background processing easy. Instead of writing a running multi-threaded job processing application (so there is no memory leakage), cron starts a simple batch processing script. This script determines whether a job is to be processed, executes the job, and then exits. No need to worry about memory leakage. There is no need to worry about stopping the thread or getting into an infinite loop.

So how does cron work? This depends on your system environment. I will only discuss the old-fashioned simple cron UNIX Command Line version. You can ask the system administrator how to implement it in your Web application.

The following is a simple cron configuration. It runs a PHP script at every night:

0 23 *** jack/usr/bin/php/users/home/jack/myscript. php

The first five fields define the time when the script should be started. Then the username used to run the script. Other commands are the command lines to be executed. The time fields are minute, hour, day in month, month, and day in week. The following are examples.

Command:

15 * jack/usr/bin/php/users/home/jack/myscript. php

Run the script 15th minutes each hour.

Command:

15, 45 * jack/usr/bin/php/users/home/jack/myscript. php

Run the script 15th and 45th minutes each hour.

Command:

*/1 3-23 *** jack/usr/bin/php/users/home/jack/myscript. php

Run the script every minute from AM to AM.

Command

30 23 ** 6 jack/usr/bin/php/users/home/jack/myscript. php

Run the script at every Saturday evening (set to 6 on Saturday ).

As you can see, the number of combinations is infinite. You can control the script running time as needed. You can also specify multiple scripts to run. In this way, some scripts can be run every minute, while other scripts (such as backup scripts) can only run once a day.

To specify the email address to which the reported error is sent, use the MAILTO Command, as shown below:

MAILTO = jherr@pobox.com

Note: For Microsoft Windows users, an equivalent Scheduled Tasks system can be used to regularly start command line processes (such as PHP scripts ).

Back to Top

Basic knowledge of batch processing architecture

Batch Processing is quite simple. In most cases, one of the two workflows is used. The first workflow is used for reporting. The script runs once a day and generates and sends the report to a group of users. The second workflow is a batch job created in response to a request. For example, I log on to a Web application and ask it to send a message to all users registered in the system to tell them a new feature. This operation must be performed in batches because there are 10,000 users in the system. PHP takes a while to complete such a task, so it must be executed by a job outside the browser.

In the second workflow, the Web application only needs to place the information in a certain position, so that the batch application can share it. This information specifies the nature of the job (for example, "Send this e-mail to all the people on the system ".) The batch processing program runs the job and deletes the job. Another way is that the handler marks the job as completed. No matter which method is used, the job should be identified as completed so that it will not run again.

The rest of this article demonstrates how to share data between the front-end and the backend of the Web application.

Back to Top

Email queue

The first method is to use a dedicated Message Queue System. In this model, a table in the database contains e-mail messages that should be sent to individual users. The Web interface uses the mailouts class to add emails to the queue. The email handler uses the mailouts class to retrieve unprocessed emails and then uses it to delete unprocessed emails from the queue.

This model requires the MySQL mode first.

Listing 1. mailout. SQL
Drop table if exists mailouts; create table mailouts (id mediumint not null AUTO_INCREMENT, from_address text not null, to_address text not null, subject text not null, content text not null, primary key (id ));

This mode is very simple. Each line contains a from and A to address, as well as the subject and content of the email.

The mailouts table in the database is processed by the PHP mailouts class.

Listing 2. mailouts. php
<? Phprequire_once ('db. php '); class Mailouts {public static function get_db () {$ dsn = 'mysql: // root: @ localhost/mailout'; $ db = & DB :: connect ($ dsn, array (); if (PEAR: isError ($ db) {die ($ db-> getMessage ();} return $ db ;} public static function delete ($ id) {$ db = Mailouts: get_db (); $ something = $ db-> prepare ('delete FROM mailouts WHERE id =? '); $ Db-> execute ($ th, $ id); return true;} public static function add ($ from, $ to, $ subject, $ content) {$ db = Mailouts: get_db (); $ something = $ db-> prepare ('insert INTO mailouts VALUES (null ,?,?,?,?) '); $ Db-> execute ($ th, array ($ from, $ to, $ subject, $ content); return true;} public static function get_all () {$ db = Mailouts: get_db (); $ res = $ db-> query ("SELECT * FROM mailouts"); $ rows = array (); while ($ res-> fetchInto ($ row) {$ rows [] = $ row;} return $ rows ;}}?>

This script contains the Pear: DB database metadata class. Then define the mailouts class, which contains three main static functions: add, delete, and get_all. The add () method adds an email to the queue, which is used by the front-end. The get_all () method returns all data from the table. Delete () method to delete an email.

You may ask why I not only call the delete_all () method at the end of the script. There are two reasons for not doing so: if you delete a message after it is sent, the message cannot be sent twice even if the script re-runs after the problem occurs; A new message may be added between the start and completion of the batch job.

The next step is to write a simple test script that adds an entry to the queue.

Listing 3. mailout_test_add.php
<? Phprequire 'mailout. php '; Mailouts: add ('donotreply @ mydomain.com', 'lly @ nocompany.com.org ', 'test subobject', 'This is a Test of the batch mail sendout');?>

In this example, I add a mailout. The message is sent to Molly of a company, including the Subject "Test Subject" and email Subject. You can run this script on the command line: php mailout_test_add.php.

To send an email, another script is required as the job handler.

Listing 4. mailout_send.php
<? Phprequire_once 'mailout. php '; function process ($ from, $ to, $ subject, $ email) {mail ($ to, $ subject, $ email, "From: $ from ");} $ messages = Mailouts: get_all (); foreach ($ messages as $ msg) {process ($ msg [1], $ msg [2], $ msg [3], $ msg [4]); Mailouts: delete ($ msg [0]) ;}?>

This script uses the get_all () method to retrieve all email messages, and then uses the PHP mail () method to send messages one by one. After each successful email sending, call the delete () method to delete the corresponding records from the queue.

Use the cron daemon to periodically run this script. The frequency of running this script depends on your application needs.

Note: The PHP Extension and Application Repository (PEAR) Repository contains an outstanding Message Queue System implementation, which can be downloaded for free.

Back to Top

More common methods

The solution for sending emails is quite good, but is there a more general method? We need to be able to send emails, generate reports, or perform other time-consuming processing without waiting for processing in the browser.

To this end, you can use the fact that PHP is an interpreted language. You can store the PHP code in the database queue and execute it later. Two tables are required, as shown in listing 5.

Listing 5. generic. SQL
Drop table if exists processing_items; create table processing_items (id mediumint not null AUTO_INCREMENT, function text not null, primary key (id); drop table if exists processing_args; create table processing_args (id mediumint not null AUTO_INCREMENT, item_id mediumint not null, key_name text not null, value text not null, primary key (id ));

The first table processing_items contains the functions called by the job handler. The second table processing_args contains the parameters to be sent to the function, in the form of a hash table consisting of key/value pairs.

Like the mailouts table, these two tables are also packaged by the PHP class, which is called ProcessingItems.

Listing 6. generic. php
<? Phprequire_once ('db. php '); class ProcessingItems {public static function get_db (){...} public static function delete ($ id) {$ db = ProcessingItems: get_db (); $ something = $ db-> prepare ('delete FROM processing_args WHERE item_id =? '); $ Db-> execute ($ something, $ id); $ something = $ db-> prepare ('delete FROM processing_items WHERE id =? '); $ Db-> execute ($ th, $ id); return true;} public static function add ($ function, $ args) {$ db = ProcessingItems :: get_db (); $…… = $ db-> prepare ('insert INTO processing_items VALUES (null ,?) '); $ Db-> execute ($ th, array ($ function); $ res = $ db-> query ("SELECT last_insert_id ()"); $ id = null; while ($ res-> fetchInto ($ row) {$ id = $ row [0];} foreach ($ args as $ key => $ value) {$…… = $ db-> prepare ('insert INTO processing_args VALUES (null ,?,?,?) '); $ Db-> execute ($ th, array ($ id, $ key, $ value);} return true;} public static function get_all () {$ db = ProcessingItems: get_db (); $ res = $ db-> query ("SELECT * FROM processing_items"); $ rows = array (); while ($ res-> fetchInto ($ row) {$ item = array (); $ item ['id'] = $ row [0]; $ item ['function'] = $ row [1]; $ item ['args'] = array (); $ ares = $ db-> query ("SELECT key_name, value FROM processing _ Args WHERE item_id =? ", $ Item ['id']); while ($ ares-> fetchInto ($ arow )) $ item ['args '] [$ arow [0] = $ arow [1]; $ rows [] = $ item;} return $ rows ;}}?>

This class contains three important methods: add (), get_all (), and delete (). Like the mailouts system, the frontend uses add () and the processing engine uses get_all () and delete ().

The test script shown in listing 7 adds an entry to the processing queue.

Listing 7. generic_test_add.php
<? Phprequire_once 'generic. php'; ProcessingItems: add ('printvalue', array ('value' => 'foo');?>

In this example, a call to the printvalue function is added and the value parameter is set to foo. I use the PHP Command Line interpreter to run this script and put this method call into the queue. Run this method using the following processing script.

Listing 8. generic_process.php
<? Phprequire_once 'generic. php '; function printvalue ($ args) {echo 'printing :'. $ args ['value']. "\ n";} foreach (ProcessingItems: get_all () as $ item) {call_user_func_array ($ item ['function'], array ($ item ['args']); ProcessingItems: delete ($ item ['id']) ;}?>

This script is very simple. It obtains the processing entries returned by get_all (), and then uses call_user_func_array (a PHP internal function) to dynamically call this method with the given parameters. In this example, call the local printvalue function.

To demonstrate this function, let's look at what happened on the command line:

% Php generic_test_add.php % php generic_process.php Printing: foo %

There are not many outputs, but you can see the key points. This mechanism can be used to delay the processing of any PHP function.

Now, if you do not like to put PHP function names and parameters into the database, another method is to create a ing between the "processing job type" name in the database and the actual PHP processing function in the PHP code. In this way, if you decide to modify the PHP backend later, the system can still work as long as the "processing job type" string matches.

Back to Top

Discard Database

Finally, I demonstrated another solution that is slightly different. It uses files in a directory to store Batch jobs, rather than databases. Here, we do not recommend that you "use this method instead of using a database". This is only a method to choose from. Whether or not to use it is determined by you.

Obviously, there is no mode in this solution because we do not use databases. Therefore, write a class that contains the add (), get_all (), and delete () methods similar to the preceding example.

Listing 9. batch_by_file.php
<? Phpdefine ('batch _ directory', 'batch _ items/'); class BatchFiles {public static function delete ($ id) {unlink ($ id); return true ;} public static function add ($ function, $ args) {$ path = ''; while (true) {$ path = BATCH_DIRECTORY.time (); if (file_exists ($ path) = false) break;} $ fh = fopen ($ path, "w"); fprintf ($ fh, $ function. "\ n"); foreach ($ args as $ k => $ v) {fprintf ($ fh, $ k. ":". $ V. "\ n");} fclose ($ fh); return true;} public static function get_all () {$ rows = array (); if (is_dir (BATCH_DIRECTORY )) {if ($ dh = opendir (BATCH_DIRECTORY) {while ($ file = readdir ($ dh ))! = False) {$ path = BATCH_DIRECTORY. $ file; if (is_dir ($ path) = false) {$ item = array (); $ item ['id'] = $ path; $ fh = fopen ($ path, 'R'); if ($ fh) {$ item ['function'] = trim (fgets ($ fh )); $ item ['args'] = array (); while ($ line = fgets ($ fh ))! = Null) {$ args = split (':', trim ($ line )); $ item ['args '] [$ args [0] = $ args [1] ;}$ rows [] = $ item; fclose ($ fh );}}} closedir ($ dh) ;}} return $ rows ;}}?>

The BatchFiles class has three main methods: add (), get_all (), and delete (). This class reads and writes files in the batch_items directory instead of accessing the database.

Use the following test code to add a new batch entry.

Listing 10. batch_by_file_test_add.php
<? Phprequire_once 'batch _ by_file.php '; BatchFiles: add ("printvalue", array ('value' => 'foo');?>

Note that apart from the BatchFiles, there is actually no indication of how the job is stored. Therefore, it is easy to change it to a database-style storage method without modifying the interface.

Finally, the Code of the processing program.

Listing 11. batch_by_file_processor.php
<? Phprequire_once 'batch _ by_file.php '; function printvalue ($ args) {echo 'printing :'. $ args ['value']. "\ n";} foreach (BatchFiles: get_all () as $ item) {call_user_func_array ($ item ['function'], array ($ item ['args']); BatchFiles: delete ($ item ['id']) ;}?>

This code is almost identical to the database version, but the file name and class name are modified.

Back to Top

Conclusion

As mentioned above, the server provides a lot of support for threads and supports backend batch processing. In some cases, it is certainly easier to use a helper thread to process small jobs. However, you can also use traditional tools (cron, MySQL, standard object-oriented PHP and Pear: DB) to create Batch jobs in PHP applications, this is easy to implement, deploy, and maintain.

References

Learning

For more information, see the original article on the developerWorks global site.

Read the PHP project resource center on IBM developerWorks to learn more about PHP.

PHP.net is an excellent resource for PHP developers.

The PEAR Mail_Queue package is a robust implementation of mail queues, including the database backend.

The crontab Manual provides details about cron configurations, but it is not easy to understand.

The section about Using PHP from the command line in the PHP manual helps you understand how to run scripts from cron.

Stay tuned to developerWorks technical events and webcast.

Learn about upcoming conferences, exhibitions, network broadcasts and other activities around the world. IBM open source developers can learn about the latest technological developments through these activities.

Visit the developerWorks open-source technology area to get a wide range of how-to information, tools, and project updates that can help you develop and use open source code technology in combination with IBM products.

DeveloperWorks podcasts includes a lot of interesting interviews and discussions for software developers.

Obtain products and technologies

See PEAR -- PHP Extension and Application Repository, which includes Pear: DB.

Use IBM trial software to improve your next open source code development project, which can be downloaded or obtained through a DVD.

Discussion

DeveloperWorks PHP Developer Forum provides a place for all PHP developers to discuss technical issues. If you have questions about PHP scripts, functions, syntax, variables, debugging, and other topics, you can raise them here.

Join the developerWorks community by participating in the developerWorks blog.

About the author

Jack D. Herrington is a Senior Software Engineer with over 20 years of experience. He has written three books: Code Generation in Action, Podcasting Hacks and PHP Hacks, and more than 30 articles.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.