Detailed explanation of batch processing in PHP

Source: Internet
Author: User
Tags foreach mysql pear php class php and thread trim ibm developerworks

What if a feature in a WEB application takes more than 1 seconds or 2 seconds to complete? Some sort of off-line processing solution is required. Learn several ways to work offline for long-running jobs in a PHP application.
Large chain stores have a big problem. Every day, there will be thousands of transactions in each store. Company executives want to dig up the data. Which products sell well? What's bad? Where do organic products sell well? How about the ice cream sales?

To capture this data, the organization must load all transactional data into a data model that is better suited to generating the reporting type required by the company. But it takes time, and as the chain grows, processing a day's data can take more than a day. So, this is a big problem.

Your WEB application may not need to handle so much data now, but any site may have more processing time than the customer is willing to wait for. In general, customers are willing to wait for 200 milliseconds, and if they exceed this time, the customer will feel the process is "slow". This number is based on desktop applications, and the WEB has made us more patient. However, the customer should not be allowed to wait longer than a few seconds. So, there are policies to handle batch jobs in PHP.

Decentralized way with Cron

On the UNIX® machine, the core program that executes the batch process is the cron daemon. This daemon reads a configuration file that tells it which command line to run and how often it runs. The daemon then executes them according to the configuration. When an error is encountered, it can even send an error output to the specified e-mail address to help debug the problem.

I know some engineers strongly advocate the use of threading technology. Thread Threading is the real way to do background processing. The cron daemon is too outdated. ”

I don't think so.

I've used both of these methods, and I think Cron has the advantage of "Keep it Simple, stupid (KISS, Simplicity is Beauty)" principle. It keeps the background processing simple. Instead of writing long-running, multithreaded job-processing applications (and therefore no memory leaks), Cron starts a simple batch script. This script determines whether there is a job to process, executes the job, and then exits. There is no need to worry about memory leaks. There is no need to worry about threads stopping or plunging into infinite loops.

So, how does cron work? This depends on the system environment in which you are located. I'm only talking about old-fashioned, simple, cron command-line versions, and you can ask your system administrator how to implement it in your own WEB application.

Here's a simple cron configuration that runs a PHP script at 11 o'clock every night:

0 * * * jack/usr/bin/php/users/home/jack/myscript.php

The first 5 fields define when the script should be started. Then the user name that should be used to run the script. The remaining commands are the command line to execute. The time fields are minutes, hours, days, months, and weeks of the month, respectively. Here are a few examples.

Command:

* * * * * jack/usr/bin/php/users/home/jack/myscript.php

Run the script at the 15th minute of each hour.

Command:

15,45 * * * * jack/usr/bin/php/users/home/jack/myscript.php

Run the script at the 15th and 45th minutes of each hour.

Command:

*/1 3-23 * * * * jack/usr/bin/php/users/home/jack/myscript.php

Run the script every minute between 3 in the morning and 11 o'clock in the evening.

Command

* * 6 jack/usr/bin/php/users/home/jack/myscript.php

Run the script at 11:30 every Saturday night (Saturday is specified by 6).

As you can see, the number of combinations is infinite. You can control the time it takes to run the script as needed. You can also specify multiple scripts to run, so that some scripts can run every minute, while other scripts (such as backup scripts) can run only once a day.

To specify which e-mail address to send the reported errors to, you can use the MAILTO directive, as follows:

Mailto=jherr@pobox.com

Note: For microsoft®windows® users, there is an equivalent scheduled Tasks system that can be used to periodically start a command-line process (such as a PHP script).

Back to the top of the page

The basics of batch processing architecture

Batch processing is fairly straightforward. In most cases, one of two workflows is used. The first workflow is used for reporting, and the script runs once a day, generating reports and sending reports to a group of users. The second workflow is the batch job that is created when a request is in response. For example, I log in to a WEB application and ask it to send a message to all users registered in the system and tell them about a new feature. This operation must be batch processed because there are 10,000 users in the system. PHP takes a while to complete such a task, so it must be performed by a job outside of the browser.

In the second workflow, the WEB application simply places the information in a location and lets the batch application share it. This information specifies the nature of the job (for example, "Send this e-mail to" "people". The batch program runs the job, and then deletes the job. Alternatively, the handler marks the job as completed. Either way, the job should be recognized as completed so that it is not run again.

The remainder of this article demonstrates various ways to share data between the front-end of a WEB application and the back end of a batch process.

Back to the top of the page

message queues

The first approach is to use a dedicated mail queuing system. In this model, a table in the database contains e-mail messages that should be sent to individual users. The Web interface uses the Mailouts class to add e-mail messages to the queue. The e-mail handler uses the Mailouts class to retrieve an unhandled e-mail message and then uses it again to remove the unhandled e-mail from the queue.

This model requires MySQL mode first.

Listing 1. Mailout.sql
DROP TABLE IF EXISTS mailouts; CREATE TABLE mailouts (id mediumint not NULL auto_increment, from_address text isn't null, to_address text NOT null, SU Bject text not NULL, content text is not NULL, PRIMARY KEY (id));

This pattern is very simple. Each row has a from and a to address, as well as the subject and content of the e-mail message.

The Mailouts table in the database is processed by the PHP mailouts class.

Listing 2. mailouts.php
    <?phprequire_once (' db.php '); class mailouts{  public static function get_db ()   {     $dsn = ' mysql://root: @localhost/mailout ';    $db =& db::connect ($DSN, Array ());     if (Pear::iserror ($db)) {die ($db->getmessage ());}     return $db; }  public static function Delete ($id)   {    $db = mailouts :: get_db ();    $sth = $db->prepare (' DELETE from mailouts WHERE id=? ');     $db->execute ($sth, $id);    return true; }  public static function Add ( $from, $to, $subject, $content)   {    $db = mailouts::get_db ();    $sth = $db->pre Pare (' INSERT into mailouts VALUES (null,?,?,?,?) ');     $db->execute ($sth, Array ($from, $to, $subject, $content));    return true;  }  public static function Get_all ()   {   $db = mailouts::get_db ();    $res = $db->query ("SELECT * from Mailouts");    $rows = AR Ray ();    while ($res->fetchinto ($row)) {$rows []= $row;}     return $rows; }}?>

This script contains Pear::D B Database Access class. Then define the Mailouts class, which contains three main static functions: Add, Delete, and Get_all. The Add () method adds an e-mail message to the queue, which is used by the front-end. The Get_all () method returns all data from the table. The Delete () method deletes an e-mail message.

You might ask why I called the Delete_all () method not just at the end of the script. There are two reasons for not doing this: if you delete each message after it is sent, the message cannot be sent two times, even if the script is rerun after a problem occurs, and a new message may be added between the start and completion of the batch job.

The next step is to write a simple test script that adds an entry to the queue.

Listing 3. mailout_test_add.php
<?phprequire ' mailout.php '; Mailouts::add (' donotreply@mydomain.com ', ' molly@nocompany.com.org ', ' Test Subject ', ' This is a test ' the batch Mail Sendout ');? >

In this example, I add a mailout, which is sent to a company's Molly, which includes the subject "Test Subject" and the email body. You can run this script on the command line: PHP mailout_test_add.php.

In order to send e-mail, you need another script, which acts as a job handler.

Listing 4. mailout_send.php
<?phprequire_once ' mailout.php '; function process ($from, $to, $subject, $email) {mail ($to, $subject, $email, "from: $from ");}  $messages = Mailouts::get_all (); foreach ($messages as $msg) {process ($msg [1], $msg [2], $msg [3], $msg [4]); Mailouts::d elete ($msg [0]); >

This script retrieves all e-mail messages using the Get_all () method, and then sends the message one by one using the PHP mail () method. After each e-mail message is successfully sent, the Delete () method is called to delete the corresponding record from the queue.

Use the cron daemon to run this script on a regular basis. The frequency of running this script depends on the needs of your application.

Note: The PHP Extension and Application Repository (PEAR) repository contains an excellent mail queue system implementation that can be downloaded free of charge.

Back to the top of the page

A more general approach

The solution that is designed to send e-mail is good, but is there a more general approach? We need to be able to send e-mail, generate reports, or perform other time-consuming processing without having to wait in the browser for processing to complete.

To do this, you can take advantage of the fact that PHP is an interpreted language. You can store the PHP code in a queue in the database, and then execute it later. This requires two tables, as shown in Listing 5.

Listing 5. Generic.sql
DROP TABLE IF EXISTS processing_items; CREATE TABLE processing_items (id mediumint NOT NULL auto_increment, function TEXT not NULL, PRIMARY KEY (ID));D ROP TABLE IF EXISTS Processing_args; CREATE TABLE Processing_args (id mediumint not NULL auto_increment, item_id mediumint not NULL, key_name TEXT NOT NULL , value TEXT not NULL, PRIMARY KEY (id));

The first table Processing_items contains functions that are called by the job handler. The second table Processing_args contains the arguments to be sent to the function, in the form of a hash table consisting of key/value pairs.

Like the Mailouts table, these two tables are also packaged by a PHP class called Processingitems.

Listing 6. generic.php
    <?phprequire_once (' db.php '); class processingitems{  public static function get_db () {...}   public static function Delete ($id)   {    $db = processingitems::get_db ();    $sth = $db->prepare (' DELETE from Processing_args WHERE item_id=? ');     $db->execute ($sth, $id);    $sth = $db->prepare (' DELETE from Processing_items WHERE id=? ');     $db->execute ($sth, $id);    return true; }  public static function Add ( $function, $args)   {    $db = processingitems::get_db ();    $sth = $db->prepare ( ' INSERT into Processing_items VALUES (null,?) ');     $db->execute ($sth, Array ($function));    $res = $db->query ("Select Last_inse RT_ID () ");    $id = null;    while ($res->fetchinto ($row)) {$id = $row [0];}     foreach ($args as $key => $value)     {        $sth = $db->prepare (' INSERT into processing_args  VALUES (null,?,?,?) ');         $db->execute ($sth, Array ($id, $key, $value));   }     return true; }  public static function Get_all ()   {    $db = processing items::get_db ();    $res = $db->query ("SELECT * from Processing_items");    $rows = AR Ray ();    while ($res->fetchinto ($row))     {         $item = Array ();        $item [' id '] = $row [0];         $item [' function '] = $row [1];        $item [' args '] = array ();         $ares = $db->query ("Select Key_name, Value from   processing_ Args WHERE Item_id=? ", $item [' id ']);        while ($ares->fetchinto ($arow))              $item [' args '] [$arow [0]] = $arow [1];         $rows []= $item;   }    return $rows; }}?>

This class contains three important methods: Add (), Get_all (), and delete (). As with mailouts systems, the front-end uses Add (), and the processing engine uses Get_all () and delete ().

The test script shown in Listing 7 adds an entry to the processing queue.

Listing 7. generic_test_add.php
<?phprequire_once ' generic.php '; Processingitems::add (' Printvalue ', Array (' Value ' => ' foo '));? >

In this example, a call to the Printvalue function is added and the value parameter is set to Foo. I use the PHP command-line interpreter to run this script and put this method call into the queue. This method is then run using the following processing script.

Listing 8. generic_process.php
<?phprequire_once ' generic.php '; function Printvalue ($args) {echo ' Printing: '. $args [' value ']. " \ n ";}  foreach (Processingitems::get_all () as $item) {Call_user_func_array ($item [' function '], array ($item [' args '])); Processingitems::d elete ($item [' id ']); >

This script is very simple. It obtains the processing entry returned by Get_all () and then uses Call_user_func_array (a PHP intrinsic function) to dynamically invoke this method with the given parameters. In this example, the local Printvalue function is invoked.

To demonstrate this functionality, let's look at what happens on the command line:

% php generic_test_add.php% php generic_process.php printing:foo%

The output is not much, but you can see the point. This mechanism allows the processing of any PHP function to be deferred.

Now, if you don't like to put PHP function names and parameters in a database, the other way is to create a mapping between the name of the "Handle job type" in the database and the actual PHP processing function in your PHP code. In this way, if you later decide to modify the back-end of PHP, the system can still work as long as the "Handle job type" string matches.

Back to the top of the page

Discard Database

Finally, I demonstrate a slightly different solution that uses files from one directory to store batch jobs instead of using databases. The idea here is not to suggest that you "take this approach, not to use a database," which is a choice and whether it is up to you to decide.

Obviously, there is no pattern in this solution because we don't use the database. So first write a class that contains the Add (), Get_all (), and Delete () methods similar to the previous examples.

Listing 9. batch_by_file.php
    <?phpdefine (' batch_directory ', ' batch_items/'); class batchfiles{  public static function Delete ($id)   {    unlink ($id);    return true; }  public static function Add ($function, $args)   {    $path = ';    while (true)     {&NBSP;&N bsp;      $path = batch_directory.time ();        if ( File_exists ($path) = = False)             break;    }    $fh = fopen ($path, "w");    fprintf ($fh, $function. " \ n ");    foreach ($args as $k => $v)     {        fprintf ($FH, $k. ":" $v. " \ n ");   }    fclose ($fh);    return true; }  public static Fu Nction Get_all ()   {    $rows = Array ();    if (Is_dir (batch_directory)) {        if ($DH = Opendir (batch_directory)) {            while ($file = Readdir ($DH))!== false) {                 $path = batch_directory. $file;                 if (is_dir ($path) = = False)            & nbsp;    {                     $item = Array ();                     $item [' id '] = $path;                     $fh = fopen ($path, ' R ');                     if ( $FH)                      {                         $item [' function '] = Trim (fgets ($fh));                         $ item[' args ' = Array ();                         while (($line = fgets ($fh))!= null)     & nbsp;                    {                 & nbsp;          $args = Split (': ', Trim ($line));                              $item [' args '] [$args [0]] = $args [1];                        }                          $rows []= $item;                         fclose ($FH);                    }                }            }            Closedir ($DH);        }   }    return $rows; }}?>

The Batchfiles class has three main methods: Add (), Get_all (), and delete (). This class does not access the database, but instead reads and writes the files in the Batch_items directory.

Use the following test code to add a new batch entry.

Listing 10. batch_by_file_test_add.php
<?phprequire_once ' batch_by_file.php '; Batchfiles::add ("Printvalue", Array (' Value ' => ' foo '));? >

One thing to note: In addition to the class name (Batchfiles), there is virtually no indication of how the job is stored. So it's easy to change it to a database-style store later, without having to modify the interface.

Finally, the code for the handler.

Listing 11. batch_by_file_processor.php
<?phprequire_once ' batch_by_file.php '; function Printvalue ($args) {echo ' Printing: '. $args [' value ']. " \ n ";}  foreach (Batchfiles::get_all () as $item) {Call_user_func_array ($item [' function '], array ($item [' args '])); Batchfiles::d elete ($item [' id ']); >

This code is almost identical to the database version, except that the file name and class name are modified.

Back to the top of the page

Conclusion

As mentioned earlier, the server provides a lot of support for threads and can be used for background batching. In some cases, using a worker thread to handle small jobs is certainly easier. However, you can also use traditional tools (cron, MySQL, Standard object-oriented PHP and Pear::D b) To create batch jobs in your PHP application, which is easy to implement, deploy, and maintain.

Resources

Learn

You can refer to the English version of this article at the DeveloperWorks Global site.

Read IBM DeveloperWorks's PHP Project Resource Center to learn more about PHP.

Php.net is an excellent resource for PHP developers.

The PEAR Mail_queue package is a robust message queue implementation that includes the backend of the database.

The Crontab manual provides details of the cron configuration, but it is not easy to understand.

A section of the PHP manual about Using PHP from the command line can help you understand how to run scripts from Cron.

Keep an eye on DeveloperWorks technical events and webcast.

Learn about upcoming conferences, exhibitions, webcasts, and other events around the world, through which IBM open source developers can learn about the latest technology developments.

Visit the DeveloperWorks Open Source technology zone for extensive how-to information, tools, and project updates that can help you develop with open source technology and use it in conjunction with IBM products.

DeveloperWorks Podcasts includes many interesting interviews and discussions that are appropriate for software developers.

Access to products and technology

Check out pear--PHP Extension and Application Repository, which contains pear::D B.

Use the IBM trial software to improve your next open source development project, which can be downloaded or obtained by DVD.

Discuss

DeveloperWorks PHP Developer Forum provides a place for all PHP developers to discuss technical issues. If you have questions about PHP scripts, functions, syntax, variables, debugging, and other topics, you can put them here.

Join the DeveloperWorks community by participating in DeveloperWorks blogs.

About the author


Jack D. Herrington is a senior software engineer with more than 20 years of working experience. He has written three books: Code Generation in Action, podcasting Hacks and PHP Hacks, and more than 30 articles.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.